Multipolar, verbose, match without query_string

I would like to be able to match multi-user search with several fields, where each searched word is contained in any for fields, any combination. The trick would be to avoid using query_string.

curl -X POST "http://localhost:9200/index/document/1" -d '{"id":1,"firstname":"john","middlename":"clark","lastname":"smith"}' curl -X POST "http://localhost:9200/index/document/2" -d '{"id":2,"firstname":"john","middlename":"paladini","lastname":"miranda"}' 

I would like the search for "John Smith" to match only document 1. The following query does what I need, but I would prefer to avoid using query_string if the user passes "OR", "AND" and any other advanced options.

 curl -X GET 'http://localhost:9200/index/_search?per_page=10&pretty' -d '{ "query": { "query_string": { "query": "john smith", "default_operator": "AND", "fields": [ "firstname", "lastname", "middlename" ] } } }' 
+16
source share
4 answers

You need a query with a few matches , but it doesn’t work exactly as you would like.

Compare the validation output for multi_match and query_string .

multi_match (with the and operator) ensures that ALL terms exist in at least one field:

 curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true' -d ' { "multi_match" : { "operator" : "and", "fields" : [ "firstname", "lastname" ], "query" : "john smith" } } ' # { # "_shards" : { # "failed" : 0, # "successful" : 1, # "total" : 1 # }, # "explanations" : [ # { # "index" : "test", # "explanation" : "((+lastname:john +lastname:smith) | (+firstname:john +firstname:smith))", # "valid" : true # } # ], # "valid" : true # } 

Although query_string (with default_operator AND ) will verify that EVERY term exists in at least one field:

 curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true' -d ' { "query_string" : { "fields" : [ "firstname", "lastname" ], "query" : "john smith", "default_operator" : "AND" } } ' # { # "_shards" : { # "failed" : 0, # "successful" : 1, # "total" : 1 # }, # "explanations" : [ # { # "index" : "test", # "explanation" : "+(firstname:john | lastname:john) +(firstname:smith | lastname:smith)", # "valid" : true # } # ], # "valid" : true # } 

So, you have several options to achieve what you need:

  1. Prepare your search terms to remove characters such as wildcards, etc., before using query_string

  2. Prepare your search terms to extract each word, and then generate a multi_match request for each word

  3. Use index_name in your mapping for name fields to index their data into a single field, which you can then use to search. (e.g. your own all field):

In the following way:

 curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d ' { "mappings" : { "test" : { "properties" : { "firstname" : { "index_name" : "name", "type" : "string" }, "lastname" : { "index_name" : "name", "type" : "string" } } } } } ' curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d ' { "firstname" : "john", "lastname" : "smith" } ' curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d ' { "query" : { "match" : { "name" : { "operator" : "and", "query" : "john smith" } } } } ' # { # "hits" : { # "hits" : [ # { # "_source" : { # "firstname" : "john", # "lastname" : "smith" # }, # "_score" : 0.2712221, # "_index" : "test", # "_id" : "VJFU_RWbRNaeHF9wNM8fRA", # "_type" : "test" # } # ], # "max_score" : 0.2712221, # "total" : 1 # }, # "timed_out" : false, # "_shards" : { # "failed" : 0, # "successful" : 5, # "total" : 5 # }, # "took" : 33 # } 

Note that firstname and lastname no longer searchable independently. Data for both fields was indexed in name .

You can use multi-fields with the path parameter to make them searchable, both independently and together, as follows:

 curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d ' { "mappings" : { "test" : { "properties" : { "firstname" : { "fields" : { "firstname" : { "type" : "string" }, "any_name" : { "type" : "string" } }, "path" : "just_name", "type" : "multi_field" }, "lastname" : { "fields" : { "any_name" : { "type" : "string" }, "lastname" : { "type" : "string" } }, "path" : "just_name", "type" : "multi_field" } } } } } ' curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d ' { "firstname" : "john", "lastname" : "smith" } ' 

A search in the any_name field works:

 curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d ' { "query" : { "match" : { "any_name" : { "operator" : "and", "query" : "john smith" } } } } ' # { # "hits" : { # "hits" : [ # { # "_source" : { # "firstname" : "john", # "lastname" : "smith" # }, # "_score" : 0.2712221, # "_index" : "test", # "_id" : "Xf9qqKt0TpCuyLWioNh-iQ", # "_type" : "test" # } # ], # "max_score" : 0.2712221, # "total" : 1 # }, # "timed_out" : false, # "_shards" : { # "failed" : 0, # "successful" : 5, # "total" : 5 # }, # "took" : 11 # } 

The firstname search for john AND smith does not work:

 curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d ' { "query" : { "match" : { "firstname" : { "operator" : "and", "query" : "john smith" } } } } ' # { # "hits" : { # "hits" : [], # "max_score" : null, # "total" : 0 # }, # "timed_out" : false, # "_shards" : { # "failed" : 0, # "successful" : 5, # "total" : 5 # }, # "took" : 2 # } 

But the firstname search only john works correctly:

 curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d ' { "query" : { "match" : { "firstname" : { "operator" : "and", "query" : "john" } } } } ' # { # "hits" : { # "hits" : [ # { # "_source" : { # "firstname" : "john", # "lastname" : "smith" # }, # "_score" : 0.30685282, # "_index" : "test", # "_id" : "Xf9qqKt0TpCuyLWioNh-iQ", # "_type" : "test" # } # ], # "max_score" : 0.30685282, # "total" : 1 # }, # "timed_out" : false, # "_shards" : { # "failed" : 0, # "successful" : 5, # "total" : 5 # }, # "took" : 3 # } 
+29
source

I would prefer to avoid using query_string in case the user passes "OR", "AND" and any other advanced parameter.

In my experience, escaping special characters with a backslash is a simple and effective solution. The list can be found in the documentation http://lucene.apache.org/core/4_5_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description , plus AND / OR / NOT / TO.

+1
source

I think the match query is what you are looking for:

“The conformance request family does not go through the process of“ parsing the request. ”It does not support field name prefixes, wildcards, or other“ advanced ”functions. For this reason, the probability of its failure is very small / does not exist, and it provides excellent behavior when it comes to simply parsing and running this text as query behavior (usually this is what the search text box does) "

http://www.elasticsearch.org/guide/reference/query-dsl/match-query.html

0
source

Currently you can use cross_fields type in multi_match

 GET /_validate/query?explain { "query": { "multi_match": { "query": "peter smith", "type": "cross_fields", "operator": "and", "fields": [ "firstname", "lastname", "middlename" ] } } } 

Cross-fields use a term-oriented approach. He considers all fields as one large field and searches for every term in any field.

However, it should be noted that if you want it to work optimally, all analyzed fields must have the same analyzer (standard, English, etc.):

For the cross_fields query type to work optimally, all fields must have the same parser. Fields that share the analyzer are grouped as mixed fields.

If you include fields with a different analysis chain, they will be added to the query in the same way as for best_fields. For example, if we added a header field to a previous query (assuming it uses a different analyzer), the explanation would be as follows:

(+ title: peter + title: blacksmith) (+ blended ("peter", fields: [first_name, last_name]) + blended ("blacksmith", fields: [first_name, last_name]))

0
source

All Articles