Short answer
You need to check the mapping and see if you are using fast-vector-highlighter . But still you need to be careful in your requests.
Detailed response
Suppose you are using a fresh instance of ES 0.20.4 on localhost .
Having created the upper hand over your example, add explicit mappings. Note. I am setting up two different analyzes for the code field. The only difference is "term_vector":"with_positions_offsets" .
curl -X PUT localhost:9200/myindex -d ' { "settings" : { "index":{ "number_of_replicas":0, "number_of_shards":1, "analysis":{ "analyzer":{ "default":{ "type":"custom", "tokenizer":"keyword", "filter":[ "lowercase", "my_ngram" ] } }, "filter":{ "my_ngram":{ "type":"nGram", "min_gram":1, "max_gram":20 } } } } }, "mappings" : { "product" : { "properties" : { "code" : { "type" : "multi_field", "fields" : { "code" : { "type" : "string", "analyzer" : "default", "store" : "yes" }, "code.ngram" : { "type" : "string", "analyzer" : "default", "store" : "yes", "term_vector":"with_positions_offsets" } } } } } } }'
Please provide some details.
curl -X POST 'localhost:9200/myindex/product' -d '{ "code" : "Samsung Galaxy i7500" }' curl -X POST 'localhost:9200/myindex/product' -d '{ "code" : "Samsung Galaxy 5 Europa" }' curl -X POST 'localhost:9200/myindex/product' -d '{ "code" : "Samsung Galaxy Mini" }'
And now we can run queries.
1) Search for "i" to see that one character search works with backlight
curl -X GET 'localhost:9200/myindex/product/_search?pretty' -d '{ "fields" : [ "code" ], "query" : { "term" : { "code" : "i" } }, "highlight" : { "number_of_fragments" : 0, "fields" : { "code":{}, "code.ngram":{} } } }'
This gives two searches:
# 1 ... "fields" : { "code" : "Samsung Galaxy Mini" }, "highlight" : { "code.ngram" : [ "Samsung Galaxy M<em>i</em>n<em>i</em>" ], "code" : [ "Samsung Galaxy M<em>i</em>n<em>i</em>" ] } # 2 ... "fields" : { "code" : "Samsung Galaxy i7500" }, "highlight" : { "code.ngram" : [ "Samsung Galaxy <em>i</em>7500" ], "code" : [ "Samsung Galaxy <em>i</em>7500" ] }
Both code and code.ngem were correctly highlighted this time. But when using a longer request, things change quickly:
2) Search for 'y m'
curl -X GET 'localhost:9200/myindex/product/_search?pretty' -d '{ "fields" : [ "code" ], "query" : { "term" : { "code" : "ym" } }, "highlight" : { "number_of_fragments" : 0, "fields" : { "code":{}, "code.ngram":{} } } }'
This gives:
"fields" : { "code" : "Samsung Galaxy Mini" }, "highlight" : { "code.ngram" : [ "Samsung Galax<em>y M</em>ini" ], "code" : [ "Samsung Galaxy Min<em>y M</em>i" ] }
The code fields are not highlighted correctly (similar to your case).
One important thing is that the term is used instead of query_string .