You need to understand how elasticsearch analyzers work. Analyzers perform tokenization (share input into a bunch of tokens, for example, spaces) and a set of token filters (filter tokens that you do not want, for example, stop words or change tokens, for example, lower case of tokens , which converts everything to lower case).
The analysis is performed in two very specific times - during indexing (when you put the material in elasticsearch) and, depending on your request, during the search (in the string you are looking for).
Thus, the default analyzer is a standard analyzer , which consists of a standard tokenizer , a standard token filter (for cleaning tokens from the standard tokenizer), a lower token register and stop the token filter .
To give this as an example, when you save the line “I love Vincent’s pie!” in elasticsearch, and you use the default default analyzer, you actually save "i", "love", "vincent", "s", "pie". Then, when you try to find “Vincent's” with a term query (which is not parsed ), you will not find anything, because “Vincent” is not one of these tokens! However, if you search for “Vincent's” using the match query (which is parsed ), you will find “I love Vincent’s pie!”. because "vincent" and "s" find matches.
In the bottom line:
- Use a parsed query, such as
match , when searching for natural language strings. - Customize the analyzers to suit your needs. You can configure a custom analyzer that runs a space tokenizer or letter tokenizer or template token if you want to get complicated, as well as any filters you need. It depends on your use case, but if you are dealing with sentences in a natural language, I do not recommend this because the standard tokenizer was created to search for a natural language.
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html for more details.
Andrew Macheret
source share