What algorithm does StackOverflow use to find similar questions?

Question

What algorithm does StackOverflow use to find similar questions?

I need to create a customer help desk on the website I am creating, and I like how StackOverflow finds similar questions. Does anyone know which algorithm the site uses, and can you provide any links where I can find it?

+4

algorithm

Dreamwave Apr 24 '13 at 15:52

source share

2 answers

Most similar to weighted tag matching and possibly matching () or an equivalent full-text weighted heading search.

He probably got the details about him in a meta somewhere or FAQ

+1

Dave Apr 24 '13 at 15:54

source share

Sailesh · Accepted Answer · 2013-04-24T17:53:01+0000

There is a whole branch of Machine Learning called clustering (type unsupervised learning ) that deals with these types of problems.

The question becomes part of the cluster, and other questions in the same cluster (probably in the order of similarity measure distance) are displayed as similar questions.

There are various features that can be used for clustering, some of which may be:

Tags
Words in title
Words in the text (less weight than the title)
Links to other questions / web pages.

etc.

There may be other formulated functions using methods such as text summarization , sentiment analysis , etc. that are used in such problems. Which features are good, why the problem depends on the problem.

Other areas where you see these algorithms in action:

Youtube
Wikipedia
IMDB

and the list goes on and on.

So what can you do about your problem?

There is no answer for this. It all depends on your data and the target request. But still you can

Learn aspects of feature engineering Machine Learning .
Learn about clustering .

(There are many online courses for this.)

or

Hire a person who knows this stuff.

What algorithm does StackOverflow use to find similar questions?

More articles: