Elasticsearch: query for multiple words in multiple fields (prefixed)

I am trying to implement an automatic counting mechanism driven by the ES index. An index has several fields, and I want to be able to query multiple fields using the AND operator and allow partial matches (only a prefix).

As an example, let's say I have 2 fields that I want to query for: "color" and "animal". I would like to be able to fulfill queries such as "duc", "duck", "purpl", "purple", "purple duck". I managed to get all of this working using multi_match () with the AND operator.

What I seem to be unable to do is match up with purple duc queries, as multi_match does not allow wildcards.

I looked at match_phrase_prefix (), but as I understand it, it does not apply to multiple fields.

I turn to the implementation of the tokenizer: it feels that the solution may be there, so the following questions ultimately arise:

1) can anyone confirm if there is a ready-made function to do what I want to do? This seems like a fairly common pattern that may be ready to use.

2) can anyone suggest any solution? Are tokenizers part of the solution? I am more than happy to be directed in the right direction and do more research. Obviously, if anyone has working solutions to share this would be great.

Thanks in advance - F

+7
autocomplete elasticsearch autosuggest
source share
2 answers

I actually wrote a blog post about this a while ago for Qbox , which you can find here: http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams . (Unfortunately, some links to the message are broken, and at the moment they are not easy to fix, but I hope you get this idea.)

I will write you a message to find out the details, but here is some code that you can use for quick testing. Note that I use edge ngrams instead of full ngrams .

Also note, in particular, the use of the _ all field , and the match request operator .

Ok, so here is the mapping:

PUT /test_index { "settings": { "analysis": { "filter": { "edgeNGram_filter": { "type": "edgeNGram", "min_gram": 2, "max_gram": 20 } }, "analyzer": { "edgeNGram_analyzer": { "type": "custom", "tokenizer": "whitespace", "filter": [ "lowercase", "asciifolding", "edgeNGram_filter" ] } } } }, "mappings": { "doc": { "_all": { "enabled": true, "index_analyzer": "edgeNGram_analyzer", "search_analyzer": "standard" }, "properties": { "field1": { "type": "string", "include_in_all": true }, "field2": { "type": "string", "include_in_all": true } } } } } 

Now add some documents:

 POST /test_index/doc/_bulk {"index":{"_id":1}} {"field1":"purple duck","field2":"brown fox"} {"index":{"_id":2}} {"field1":"slow purple duck","field2":"quick brown fox"} {"index":{"_id":3}} {"field1":"red turtle","field2":"quick rabbit"} 

And this query seems to illustrate what you want:

 POST /test_index/_search { "query": { "match": { "_all": { "query": "purp fo slo", "operator": "and" } } } } 

return:

 { "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.19930676, "hits": [ { "_index": "test_index", "_type": "doc", "_id": "2", "_score": 0.19930676, "_source": { "field1": "slow purple duck", "field2": "quick brown fox" } } ] } } 

Here is the code I used to test it:

http://sense.qbox.io/gist/b87e426062f453d946d643c7fa3d5480cd8e26ec

+9
source share

Elasticsearch 6.0.0 breaks Sloan Arens answer . This is because include_in_all deprecated.
Use copy_to instead.

 PUT my_index { "mappings": { "_doc": { "properties": { "first_name": { "type": "text", "copy_to": "full_name" }, "last_name": { "type": "text", "copy_to": "full_name" }, "full_name": { "type": "text" } } } } } PUT my_index/_doc/1 { "first_name": "John", "last_name": "Smith" } GET my_index/_search { "query": { "match": { "full_name": { "query": "John Smith", "operator": "and" } } } } 

An example from the previous link.

0
source share

All Articles