ElasticSearch: Sorting by values โ€‹โ€‹of attached documents

I am having a problem using ElasticSearch for my java application. I explain to myself, I have a mapping, something like:

{ "products": { "properties": { "id": { "type": "long", "ignore_malformed": false }, "locations": { "properties": { "category": { "type": "long", "ignore_malformed": false }, "subCategory": { "type": "long", "ignore_malformed": false }, "order": { "type": "long", "ignore_malformed": false } } }, ... 

So, as you can see, I get a list of products that consist of locations. In my model, these locations are all category products. This means that a product can be in one or more categories. In each of these categories, the product has an order that the customer wants to show them.

For example, a diamond product may have first place in jewelry, but a third place in women (my examples are not so logical ^^). Therefore, when I click on "Jewelry", I want to display these products sorted by field. The location of this category.

At the moment, when I look through all the products of a certain category, the answer for ElasticSearch that I get looks something like this:

 {"id":5331880,"locations":[{"category":5322606,"order":1}, {"category":5883712,"subCategory":null,"order":3}, {"category":5322605,"subCategory":6032961,"order":2},....... 

Can I sort these products by location.order for the specific category I'm looking for? For example, if I request category 5322606, I want order 1 for this product to be accepted.

Thanks in advance! Sincerely, Olivier.

+6
source share
2 answers

First, fix the terminology: in Elasticsearch, "parent / child" refers to completely separate documents, where the child document points to the parent document. Parent and children are kept in the same shard, but they can be updated independently.

In the example above, what you are trying to achieve can be accomplished with nested docs.

Your locations field currently has a type:"object" value. This means that the values โ€‹โ€‹in each place are aligned to look something like this:

 { "locations.category": [5322606, 5883712, 5322605], "locations.subCategory": [6032961], "locations.order": [1, 3, 2] } 

In other words, the โ€œunderโ€ fields are aligned into multi-valued fields, which is useless for you because there is no correlation between category: 5322606 and order: 1 .

However, if you change locations to type:"nested" , then it will index each location as a separate document, which means that each location can be requested independently using the dedicated nested query and filter .

By default, a nested query returns _score depending on how well each location matches, but in your case you want to return the highest order field value of all the matching child elements. To do this, you need to use the custom_score request.

So, let's start by creating an index with the appropriate mapping:

 curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d ' { "mappings" : { "products" : { "properties" : { "locations" : { "type" : "nested", "properties" : { "order" : { "type" : "long" }, "subCategory" : { "type" : "long" }, "category" : { "type" : "long" } } }, "id" : { "type" : "long" } } } } } ' 

We index your doc example:

 curl -XPOST 'http://127.0.0.1:9200/test/products?pretty=1' -d ' { "locations" : [ { "order" : 1, "category" : 5322606 }, { "order" : 3, "subCategory" : null, "category" : 5883712 }, { "order" : 2, "subCategory" : 6032961, "category" : 5322605 } ], "id" : 5331880 } ' 

And now we can search for it using the queries we discussed above:

 curl -XGET 'http://127.0.0.1:9200/test/products/_search?pretty=1' -d ' { "query" : { "nested" : { "query" : { "custom_score" : { "script" : "doc[\u0027locations.order\u0027].value", "query" : { "constant_score" : { "filter" : { "and" : [ { "term" : { "category" : 5322605 } }, { "term" : { "subCategory" : 6032961 } } ] } } } } }, "score_mode" : "max", "path" : "locations" } } } ' 

Note: single quotes in the script were escaped as \u0027 to bypass shell quoting. The script looks like this: "doc['locations.order'].value"

If you look at _score from the results, you will see that it used the order value from the associated location :

 { "hits" : { "hits" : [ { "_source" : { "locations" : [ { "order" : 1, "category" : 5322606 }, { "order" : 3, "subCategory" : null, "category" : 5883712 }, { "order" : 2, "subCategory" : 6032961, "category" : 5322605 } ], "id" : 5331880 }, "_score" : 2, "_index" : "test", "_id" : "cXTFUHlGTKi0hKAgUJFcBw", "_type" : "products" } ], "max_score" : 2, "total" : 1 }, "timed_out" : false, "_shards" : { "failed" : 0, "successful" : 5, "total" : 5 }, "took" : 9 } 
+9
source

Just add a more updated version related to sorting the parent field. We can query the type of the parent document, sorted by child field (for example, "count"), as shown below.

https://gist.github.com/robinloxley1/7ea7c4f37a3413b1ca16

0
source

All Articles