Find posts with the most common tags, such as related questions in StackOverflow

We can mark the question with several tags on the StackOverflow website, I am interested to know how the most common questions are with common tags.

Suppose we have 100 questions in a database, each question has several tags. Let's say a user views a specific question, and we want the system to display related questions on the page. The criteria for the relevant question are the most common tags.

For example: Question 1 is labeled AAA, BBB, CCC, DDD, EEE.

Question 2 refers to the first 1, because it also has all the tags 5. Question 3 refers to the beginning of 2, because it has only 4 or 3 tags that Questio1 has. ......

So my question is how to create a database and quickly find out the questions related to question 1. Thank you very much.

+5
source share
3 answers

Perhaps something like:

select qt.question_id, count(*)
from   question_tags qt
where  qt.tag in
( select qt2.tag
  from   question_tags qt2
  where  qt2.question_id = 123
)
group by qt.question_id
order by 2 desc
+9
source

If you can guarantee no duplicate tags for the question, you can do the following:

SELECT
     QT2.question_id,
     COUNT(*) AS cnt
FROM
     Question_Tags QT1
INNER JOIN Question_Tags QT2 ON QT2.tag = QT1.tag AND QT2.question_id <> QT1.question_id
WHERE
     QT1.question_id = @question_id
GROUP BY
     QT2.question_id
ORDER BY
     cnt DESC

If you cannot guarantee the uniqueness of the tags in the question, then the Tony Andrews solution will work. Its work will work in any case, but you should compare the performance of your system with this method if you can guarantee uniqueness with the help of restrictions.

+3
source

, , ( ).

Edit: is it about SO or about your own application? If you are talking about your own application, remove the SO tag as it is misleading.

Edit2: I would say something like:

SELECT * FROM `questions` WHERE `tag` LIKE '%tagname%' OR (looped for each tag) LIMIT 5,0

Where 5 is the maximum results you want to return (at least for some optimization). This is probably not the best solution, but I saw how it works.

You can also try a match LIKEusing the title.

0
source

All Articles