Find exact matches using the Lucene Search API

I am working on a company search API using Lucene. My Lucene company index has 2 companies: 1.Abigail Adams National Bancorp, Inc. 2. National Bankcorp

If the user types a name in National Bancorp, then only company No. 2 (i.e. National Bancorp) should be returned, and not No. 1 ..... i.e. only exact matches must be returned. How to achieve this functionality?

Thanks for reading.

+6
lucene
source share
4 answers

You can use KeywordAnalyzer to index and search in this field. The keyword analyzer generates only one token for the entire string.

+11
source share

This is what can guarantee the use of a pebble filter. This filter combines several words. For example, Abigail Adams National Bancorp with a ShingleFilter of 3 tokens will produce (assuming a simple WhitespaceAnalyzer) [Abigail], [Abigail Adams], [Abigail Adams National], [Adams National Bancorp], [Adams National], [Adams], [ National], [National Bancorp] and [Bancorp].

If a user has requests to National Bancorp, you will get an exact match with the National Bancorp itself and a lower score with an exact match at the Abigail Adams National Bancorp (lower score because it has a lot more tokens in the field, thus lowering idf) . I think it makes sense to return both documents on such a request.

You might also want to apply a pebble filter during the query period, depending on the use case.

+1
source share

I searched a lot Google without help for the same problem. Scratching my head, I found a solution. Find the double quote string that will solve your problem.

National Bancorp will return both # 1 and # 2, but "National Bancorp" will only return to # 2.

+1
source share

You can revise your requirements depending on whether I understood your question correctly. Please tolerate me if I misunderstood you.

Just some food for thought:

  • If you want to get only exact matches, then why look first?

  • Are you sure the user expects exact matches? I usually do a search, assuming the search engine will post the missing words.

  • Suppose the user searched for the National Bank, but the National Bank was no longer in your index. Would you like Abigail Adams National Bancorp, Inc to be excluded from the results simply because they were not accurate?

In light of this, I would advise you to continue to present all possible matches (exact or not) to the user and let them decide for themselves what is most suitable for them. I say this simply because you may not think like all your users. Lucene will make sure that the closest matches have the highest ranking results, helping them make faster choices.

0
source share

All Articles