Search for Elasticsearch wildcards in the not_analyzed field

Question

Search for Elasticsearch wildcards in the not_analyzed field

I have an index, for example, the following settings and mapping;

{ "settings":{ "index":{ "analysis":{ "analyzer":{ "analyzer_keyword":{ "tokenizer":"keyword", "filter":"lowercase" } } } } }, "mappings":{ "product":{ "properties":{ "name":{ "analyzer":"analyzer_keyword", "type":"string", "index": "not_analyzed" } } } } }

I am struggling with an implementation for finding wildcards in the name field. My sample data:

 [ {"name": "SVF-123"}, {"name": "SVF-234"} ]

When I execute the following query:

 http://localhost:9200/my_index/product/_search -d ' { "query": { "filtered" : { "query" : { "query_string" : { "query": "*SVF-1*" } } } } }'

It returns SVF-123 , SVF-234 . I think it is still tokenizing data. It should return only SVF-123 .

Could you help with this?

Thanks in advance

+7

search tokenize elasticsearch lucene

Hüseyin baby Jan 16 '14 at 11:53

source share

4 answers

There a couple of things go wrong.

First, you say that you do not want the terms to analyze index time. Then an analyzer was configured (which used the search time), which generates incompatible terms. (They have a bottom area)

By default, all terms enter the _all field with a standard analyzer. This is where you end up looking. Since it symbolizes "-", you get OR "* SVF" and "1 *".

Try to make a verge of terms on _all and by name to find out what happens.

Here's runnable Play and gist: https://www.found.no/play/gist/3e5fcb1b4c41cfc20226 ( https://gist.github.com/alexbrasetvik/3e5fcb1b4c41cfc20226 )

You need to make sure that the conditions you specify are compatible with what you are looking for. You probably want to disable _all , as it can _all what happens.

 #!/bin/bash export ELASTICSEARCH_ENDPOINT="http://localhost:9200" # Create indexes curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{ "settings": { "analysis": { "text": [ "SVF-123", "SVF-234" ], "analyzer": { "analyzer_keyword": { "type": "custom", "tokenizer": "keyword", "filter": [ "lowercase" ] } } } }, "mappings": { "type": { "properties": { "name": { "type": "string", "index": "not_analyzed", "analyzer": "analyzer_keyword" } } } } }' # Index documents curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d ' {"index":{"_index":"play","_type":"type"}} {"name":"SVF-123"} {"index":{"_index":"play","_type":"type"}} {"name":"SVF-234"} ' # Do searches # See all the generated terms. curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d ' { "facets": { "name": { "terms": { "field": "name" } }, "_all": { "terms": { "field": "_all" } } } } ' # Analyzed, so no match curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d ' { "query": { "match": { "name": { "query": "SVF-123" } } } } ' # Not analyzed according to `analyzer_keyword`, so matches. (Note: term, not match) curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d ' { "query": { "term": { "name": { "value": "SVF-123" } } } } ' curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d ' { "query": { "term": { "_all": { "value": "svf" } } } } '

+12

Alex Brasetvik Jan 16 '14 at 14:18

source share

By adding Hüseyin to the answer, we can use AND as the default operator. Thus, SVF and 1 * will be combined using the AND operator, so we will give us the correct results.

 "query": { "filtered" : { "query" : { "query_string" : { "default_operator": "AND", "analyze_wildcard": true, "query": "*SVF-1*" } } } }

0

Viduranga wijesooriya Sep 01 '16 at 10:09

source share

@Viduranga Wijesooriya, as you declared "default_operator" : "AND" , will check for the presence of SVF and 1, but exact match is still impossible, but ya will filter the results in a more suitable way, leaving all combinations of SVF and 1 and sorting the results by relevance that SVF-1 will contribute to order

To display the exact result

 "settings": { "analysis": { "analyzer": { "analyzer_keyword": { "type": "custom", "tokenizer": "keyword", "filter": [ "lowercase" ] } } } }, "mappings": { "type": { "properties": { "name": { "type": "string", "analyzer": "analyzer_keyword" } } } }

and request

 { "query": { "bool": { "must": [ { "query_string" : { "fields": ["name"], "query" : "*svf-1*", "analyze_wildcard": true } } ] } } }

result

 { "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "play", "_type": "type", "_id": "AVfXzn3oIKphDu1OoMtF", "_score": 1, "_source": { "name": "SVF-123" } } ] } }

0

Rajan Oct 18 '16 at 13:05

source share

Hüseyin baby · Accepted Answer · 2014-01-17T12:30:24+0000

My adventure solution

I started my own business, as you can see in my question. Whenever I changed part of my settings, one part started working, but the other part stopped working. Let me give a story of my solution:

1.) By default, I indexed my data. This means that my data is analyzed by default. This will cause a problem on my side. For example:

When the user began to search for a keyword such as SVF-1 , the system launches this query:

 { "query": { "filtered" : { "query" : { "query_string" : { "analyze_wildcard": true, "query": "*SVF-1*" } } } } }

and results;

 SVF-123 SVF-234

This is normal because the name field of my documents is analyzed . This splits the request into SVF and 1 tokens, and SVF matches my docs, although 1 doesn't match. I missed this path. I created a mapping for my fields, making them not_analyzed

 { "mappings":{ "product":{ "properties":{ "name":{ "type":"string", "index": "not_analyzed" }, "site":{ "type":"string", "index": "not_analyzed" } } } } }

but my problem continued.

2.) I wanted to try a different path after many studies. Decided to use a group request . My request:

 { "query": { "wildcard" : { "name" : { "value" : *SVF-1*" } } }, "filter":{ "term": {"site":"pro_en_GB"} } } }

This request worked, but there is one problem. My fields are not already parsed, and I'm doing a wildcard query. Case sensitivity is the problem here. If I search like svf-1 , it returns nothing. Since the user can enter a lowercase version of the query.

3.) I changed the structure of my document to;

 { "mappings":{ "product":{ "properties":{ "name":{ "type":"string", "index": "not_analyzed" }, "nameLowerCase":{ "type":"string", "index": "not_analyzed" } "site":{ "type":"string", "index": "not_analyzed" } } } } }

I have another field for name called nameLowerCase . When I index my document, I set my document as follows:

 { name: "SVF-123", nameLowerCase: "svf-123", site: "pro_en_GB" }

Here, I convert the query keyword to lowercase and perform a search operation on the new index nameLowerCase . And displaying the name field.

Final version of my query:

 { "query": { "wildcard" : { "nameLowerCase" : { "value" : "*svf-1*" } } }, "filter":{ "term": {"site":"pro_en_GB"} } } }

Now it works. There is also one way to solve this problem using multi_field . My request contains a dash (-) and is facing some problems.

Many thanks to @Alex Brasetvik for the detailed explanations and efforts.

Search for Elasticsearch wildcards in the not_analyzed field

More articles: