Elasticsearch - Word Search with Apostrophe

I want to be able to search for the following words

Vincent Vincents Vincent

Currently the test in the database and ES is Vincent's

Is it possible to detect possessive, as well as ignore the apostrophe. I looked at Word-Delimiter, but I can not find a worthy explanation on this

+7
source share
3 answers

You need to understand how elasticsearch analyzers work. Analyzers perform tokenization (share input into a bunch of tokens, for example, spaces) and a set of token filters (filter tokens that you do not want, for example, stop words or change tokens, for example, lower case of tokens , which converts everything to lower case).

The analysis is performed in two very specific times - during indexing (when you put the material in elasticsearch) and, depending on your request, during the search (in the string you are looking for).

Thus, the default analyzer is a standard analyzer , which consists of a standard tokenizer , a standard token filter (for cleaning tokens from the standard tokenizer), a lower token register and stop the token filter .

To give this as an example, when you save the line “I love Vincent’s pie!” in elasticsearch, and you use the default default analyzer, you actually save "i", "love", "vincent", "s", "pie". Then, when you try to find “Vincent's” with a term query (which is not parsed ), you will not find anything, because “Vincent” is not one of these tokens! However, if you search for “Vincent's” using the match query (which is parsed ), you will find “I love Vincent’s pie!”. because "vincent" and "s" find matches.

In the bottom line:

  • Use a parsed query, such as match , when searching for natural language strings.
  • Customize the analyzers to suit your needs. You can configure a custom analyzer that runs a space tokenizer or letter tokenizer or template token if you want to get complicated, as well as any filters you need. It depends on your use case, but if you are dealing with sentences in a natural language, I do not recommend this because the standard tokenizer was created to search for a natural language.

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html for more details.

+4
source

Use "ownerive_english" as described in the ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html

Example:

 { "index" : { "analysis" : { "analyzer" : { "my_analyzer" : { "tokenizer" : "standard", "filter" : ["standard", "lowercase", "my_stemmer"] } }, "filter" : { "my_stemmer" : { "type" : "stemmer", "name" : "possessive_english" } } } } } 

Unconfirmed code, but it should work. Here is an example with "word_delimiter":

 { "index" : { "analysis" : { "analyzer" : { "my_analyzer" : { "tokenizer" : "standard", "filter" : ["standard", "lowercase", "my_word_delimiter"] } }, "filter" : { "my_word_delimiter" : { "type" : "word_delimiter", "preserve_original": "true" } } } } } 

Works for me :-) ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html

+3
source

Find the word with Apostrophe using a matching query similar to this.

 { "query": { "bool": { "must": [ { "match": { "_all": "Vincent Vincents Vincent's" } } ] } } } 
0
source

All Articles