Analyzed v ne_analysis or ...?

Question

Analyzed v ne_analysis or ...?

New to ES, so maybe a dumb question, but I'm trying to do a wildcard search, for example: "SOMECODE*" and "*SOMECODE"

It works fine, but the value in the document may be "SOMECODE/FRED" .
The problem is that * will match anything (which does not contain anything).
*SOMECODE will get hit on SOMECODE/FRED .

I tried searching */SOMECODE , but this returns nothing.
I think field tokenization is the main problem.
those. / causes a value of 2 words.

I tried not_analyzed map in the field to not_analyzed , but then I can’t find it at all.

Am I doing it wrong?

thanks

+8

wildcard elasticsearch

Jonesie Jan 30 '13 at 2:52

source share

1 answer

Zach · Accepted Answer · 2013-01-30T20:56:16+0000

By setting not_analyzed , you only allow exact matches (for example, "SOMECODE/FRED" , including random and special characters).

I assume that you are using a standard analyzer (this is the default analyzer if you do not specify it). If this is the case, Standard will process slashes as a token separator and generate two tokens [somecode] and [fred] :

 $ curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty' -d 'SOMECODE/FRED' { "tokens" : [ { "token" : "somecode", "start_offset" : 0, "end_offset" : 8, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "fred", "start_offset" : 9, "end_offset" : 13, "type" : "<ALPHANUM>", "position" : 2 } ] }

If you do not want this behavior, you need to switch to a tokenizer that is not divided into special characters. However, I would question the precedent for this. Typically, you will want to break up these types of characters.

Analyzed v ne_analysis or ...?

More articles: