MySQL Group similar rows based on records in a second table

I really did not know what to call it.

I have several tables structured this way

Offer table

id |    sentence       | ...
----------------------------
1  | See Spot run      | ...
2  | See Jane run      | ...
3  | Jane likes cheese | ...

Word table

id | word (unique)
----------
1  | See
2  | Spot
3  | run
4  | Jane
5  | likes
6  | cheese

And the table "word_references"

sentence_id | word_id
---------------------
          1 | 1 
          1 | 2
          1 | 3
          2 | 1
          2 | 3
          2 | 4
          3 | 4
          3 | 5
          3 | 6

I want to return a list of pairs of sentences that are similar to each other based on common words sorted by similarity. Therefore, he must return:

one | two | similarity
----------------------
 1  |  2  |  2
 2  |  3  |  1

because sentences 1 and 2 share two words: “Watch” and “run”, while sentences 2 and 3 use the same word: “Jane”.

+4
source share
2 answers

This request should solve your problem:

SELECT r1.sentence_id AS one, 
       r2.sentence_id AS two, 
       Count(*)       AS similarity 
FROM   word_references r1 
       INNER JOIN word_references r2 
               ON r1.sentence_id < r2.sentence_id 
                  AND r1.word_id = r2.word_id 
GROUP  BY r1.sentence_id, 
          r2.sentence_id 

this gives:

one | two | similarity
----------------------
 1  |  2  |  2
 2  |  3  |  1

sqlfiddle

r1.sentence_id < r2.sentence_id r1.sentence_id <> r2.sentence_id, :

one | two | similarity
----------------------
 1  |  2  |  2
 2  |  3  |  1
 2  |  1  |  2
 3  |  2  |  1
+2

- :

select w1.sentence_id, w2.sentence_id, count(*) as similarity
from word_references w1 
left join word_references w2 on  w1.word_id=w2.word_id and w1.sentence_id<>w2.sentence_id
where w2.sentence_id is not null
group by w1.sentence_id, w2.sentence_id 
order by count(*) desc

:

+ ---------------- + ---------------- + --------------- +
| sentence_id      | sentence_id      | similarity      |
+ ---------------- + ---------------- + --------------- +
| 1                | 2                | 2               |
| 2                | 1                | 2               |
| 3                | 2                | 1               |
| 2                | 3                | 1               |
+ ---------------- + ---------------- + --------------- +
4 rows
0

All Articles