Elasticsearch case-insensitive filter search by un-pane field

Similar questions are asked here by the Elasticsearch Map case, which is not sensitive to not_analyzed documents , however mine is a little different because I deal with special characters.

Most people recommend using a keyword analyzer in combination with a lowercase filter . However, this does not work for my case, because the keyword analyzer tokens special spaces, such as ^, #, etc in spaces. that violate the type of support I'm going to.

i.e.

  • ^HELLOWORLD should be matched using ^HELLOWORLD , but not helloworld
  • #FooBar should match #FooBar , but not foobar .
  • Foo Bar should match Foo Bar , but not foo or bar .

Similar functionality with what we see here https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html#_term_filter_with_numbers , but with case insensitivity.

Does anyone know how to do this?

EDIT 1:

It seems like the essence of my problem was multi-field, as the keyword + lowercase seems to resolve the question asked in the title. However, it would be more accurate to ask this question for a multi-field value property.

test_mapping.json:

 { "properties" : { "productID1" : { "type" : "string", "index_analyzer" : "keyword_lowercase", "search_analyzer" : "keyword_lowercase" }, "productID2" : { "type": "multi_field", "keyword_edge_ID": { "type": "string", "index_analyzer":"keyword_lowercase_edge", "search_analyzer":"keyword_lowercase_edge" }, "productID2": { "type": "string", "index": "analyzed", "store": "yes", "index_analyzer":"keyword_lowercase", "search_analyzer":"keyword_lowercase" } } } } 

test.json:

 { "index": { "analysis": { "filter":{ "edgengramfilter": { "type": "edgeNgram", "side": "front", "min_gram": 1, "max_gram": 32 } }, "analyzer": { "keyword_lowercase" : { "type" : "custom", "tokenizer": "keyword", "filter": "lowercase" }, "keyword_lowercase_edge": { "tokenizer": "keyword", "filter": ["lowercase", "edgengramfilter"] } } } } } 

Shell script to create an index with mappings:

 #!/bin/sh ES_URL="http://localhost:9200" curl -XDELETE $ES_URL/test curl -XPOST $ES_URL/test/ --data-binary @test.json curl -XPOST $ES_URL/test/query/_mapping --data-binary @test_mapping.json 

POST localhost:9200/test/query :

 { "productID1" : "^A", "productID2" : "^A" } 

I wish I could match productID2 with "^ A", but now it does not return any results, but it works when I make the same request with productID1. {"query": { "match": { "productID2": "^A" }}}

+5
source share
1 answer

As you can see from the example below, the keyword tokenizer and lowercase does just that - it reduces the overall value while retaining all spaces and special characters. An example of how to use it can be found in this answer .

 curl "localhost:9200/_analyze?pretty&tokenizer=keyword&filters=lowercase" -d "^HELLOWORLD" { "tokens" : [ { "token" : "^helloworld", "start_offset" : 0, "end_offset" : 11, "type" : "word", "position" : 1 } ] } curl "localhost:9200/_analyze?pretty&tokenizer=keyword&filters=lowercase" -d "#FooBar" { "tokens" : [ { "token" : "#foobar", "start_offset" : 0, "end_offset" : 7, "type" : "word", "position" : 1 } ] } curl "localhost:9200/_analyze?pretty&tokenizer=keyword&filters=lowercase" -d "Foo Bar" { "tokens" : [ { "token" : "foo bar", "start_offset" : 0, "end_offset" : 7, "type" : "word", "position" : 1 } ] } 
+4
source

All Articles