The difference between such fuzzy and more similar?

What is the difference between Lucene MoreLikeThis (mlt) and FuzzyQuery (flt)?

I evaluate both types of queries through Elasticsearch (ES), and I find that they are conceptually very similar:

  • mlt: compare existing document fields with fields of other documents vs
  • flt: compare string with fields of other documents

However, performance is fltabout an order of magnitude slower than the request mlt.

I am using the latest ES, which in turn uses Lucene 4.5.


From fuzzy like these docs:

Intimidates ALL terms provided as strings, and then selects the best n differentiating terms. Essentially, this mixes the behavior of FuzzyQuery and MoreLikeThis, but with special consideration for fuzzy counting factors. This, as a rule, gives good results for queries in which users can provide details in several fields and do not know the syntax of Boolean queries, and also want to get a degree of fuzzy matching and fast query.

For each source word, fuzzy variations are stored in BooleanQuery without a coordinating factor (because we are not looking for matches across multiple variations in any document). In addition, the specialized TermQuery is used for variants and does not use these variants of IDF terms, because this will contribute to more rare terms, such as spelling errors. Instead, all variants use the same IDF rating (the one used for the original request), and this is taken into account when forcing variants. If the original query does not exist in the index, the average IDF options are used.

+4
source share
1 answer

, . " ", , .

" " like_text fields. , , . , , , , .

" " , . , , like_text, , like_text . , , - , , Lucene 4.x .

+2

All Articles