This is what can guarantee the use of a pebble filter. This filter combines several words. For example, Abigail Adams National Bancorp with a ShingleFilter of 3 tokens will produce (assuming a simple WhitespaceAnalyzer) [Abigail], [Abigail Adams], [Abigail Adams National], [Adams National Bancorp], [Adams National], [Adams], [ National], [National Bancorp] and [Bancorp].
If a user has requests to National Bancorp, you will get an exact match with the National Bancorp itself and a lower score with an exact match at the Abigail Adams National Bancorp (lower score because it has a lot more tokens in the field, thus lowering idf) . I think it makes sense to return both documents on such a request.
You might also want to apply a pebble filter during the query period, depending on the use case.
wesen
source share