In my Elasticsearch index, I have documents with multiple tokens in the same position.
I want to return a document when I match at least one token in each position. The order of the tokens is not important. How can i do this? I am using Elasticsearch 0.90.5.
Example:
I index such a document.
{ "field":"red car" }
I use a synonymous token filter, which adds synonyms in the same positions as the original token. So, now there are 2 positions in the field:
- Position 1: Red
- Position 2: "car", "car"
My solution at the moment:
To ensure compliance with all positions, I also indicate the maximum position.
{ "field":"red car", "max_position": 2 }
I have a usual affinity that extends from DefaultSimilarity and returns 1 tf (), idf () and lengthNorm (). The final result is the number of matching terms in the field.
Query:
{ "custom_score": { "query": { "match": { "field": "a car is an automobile" } }, "_script": "_score*100/doc[\"max_position\"]+_score" }, "min_score":"100" }
The problem with my solution:
The above search should not match the document, because there is no “red color” in the query string. But this is appropriate because Elasticsearch counts the matches for the car and the car as two matches, and this gives a score of 2, which leads to a score of script 102 that satisfies the "min_score".
Danyg source share