Is there any label matching algorithm?

My system is similar to stackoverflow. In principle, a message can have several tags, and there is a search function that finds messages with corresponding request tags (all tags must be matched)

I wonder if there are any algorithms / data structure that effectively solve the tag / message search problem? Which one is most effective in terms of speed (time complexity)?

+4
source share
2 answers

In the past, I did not use specialized DSs for this. Infact, if you want to do this using an RDBMS, kindly read the details of how Wordpress does this using taxonomies . Basically, you will have a separate tag table, and then individual messages can contain several tags (using keys).

Another popular approach is to treat your problem as a facet problem. You should use the full-text indexing infrastructure and develop your faceted view on top of this. Here is a great post from the creator of Lucene / Solr , which explains this very case. Using facsimile scanning, you can display something that stackoverflow does:

algorithm × 21165 search × 8863 data-structures × 5867 tags × 2886 stackoverflow × 721 
+2
source

The most efficient way to store such search data is usually in the Inverted Index . In addition, this is what the most common search engines / search engines are built on.

For a real implementation of this, I suggest you take a look at Apache Lucene .

0
source

All Articles