The difference between such fuzzy and more similar?

Question

The difference between such fuzzy and more similar?

What is the difference between Lucene MoreLikeThis (mlt) and FuzzyQuery (flt)?

I evaluate both types of queries through Elasticsearch (ES), and I find that they are conceptually very similar:

mlt: compare existing document fields with fields of other documents vs
flt: compare string with fields of other documents

However, performance is fltabout an order of magnitude slower than the request mlt.

I am using the latest ES, which in turn uses Lucene 4.5.

Intimidates ALL terms provided as strings, and then selects the best n differentiating terms. Essentially, this mixes the behavior of FuzzyQuery and MoreLikeThis, but with special consideration for fuzzy counting factors. This, as a rule, gives good results for queries in which users can provide details in several fields and do not know the syntax of Boolean queries, and also want to get a degree of fuzzy matching and fast query.
For each source word, fuzzy variations are stored in BooleanQuery without a coordinating factor (because we are not looking for matches across multiple variations in any document). In addition, the specialized TermQuery is used for variants and does not use these variants of IDF terms, because this will contribute to more rare terms, such as spelling errors. Instead, all variants use the same IDF rating (the one used for the original request), and this is taken into account when forcing variants. If the original query does not exist in the index, the average IDF options are used.

+4

elasticsearch lucene similarity fuzzy-search morelikethis

miku Oct 14 '13 at 16:25

source share

1 answer

javanna · Accepted Answer · 2013-10-17T08:47:53+0000

, . " ", , .

" " like_text fields. , , . , , , , .

" " , . , , like_text, , like_text . , , - , , Lucene 4.x .

The difference between such fuzzy and more similar?

More articles: