What algorithm does StackOverflow use to find similar questions?

I need to create a customer help desk on the website I am creating, and I like how StackOverflow finds similar questions. Does anyone know which algorithm the site uses, and can you provide any links where I can find it?

+4
source share
2 answers

There is a whole branch of Machine Learning called clustering (type unsupervised learning ) that deals with these types of problems.

The question becomes part of the cluster, and other questions in the same cluster (probably in the order of similarity measure distance) are displayed as similar questions.

There are various features that can be used for clustering, some of which may be:

  • Tags
  • Words in title
  • Words in the text (less weight than the title)
  • Links to other questions / web pages.

etc.

There may be other formulated functions using methods such as text summarization , sentiment analysis , etc. that are used in such problems. Which features are good, why the problem depends on the problem.

Other areas where you see these algorithms in action:

  • Youtube
  • Wikipedia
  • IMDB

and the list goes on and on.

So what can you do about your problem?

There is no answer for this. It all depends on your data and the target request. But still you can

  • Learn aspects of feature engineering Machine Learning .
  • Learn about clustering .

(There are many online courses for this.)

or

  • Hire a person who knows this stuff.
+5
source

Most similar to weighted tag matching and possibly matching () or an equivalent full-text weighted heading search.

He probably got the details about him in a meta somewhere or FAQ

+1
source

All Articles