ElasticSearch Join Filter: Use subquery results as filter input?

Question

ElasticSearch Join Filter: Use subquery results as filter input?

I have a use case where I want to use ElasticSearch for real-time analysis. Inside this, I want to be able to calculate some simple affinity estimates.

Currently, they are determined using the number of transactions that the user base performs with a filter by criterion, compared with the full user base.

In my opinion, I will need to do the following:

Get individual transactions of my filtered user base
The request for these transactions (types) in the full user base
Make a calculation (rationing, etc.)

To get “separate transactions” for a filtered user base, I am currently using a cut filter condition filter query that returns all terms (transaction types). As far as I understand, I need to use this result as the input of a condition filter request for the second step to get the result that I want.

I read that there is a transfer request on GitHub that seems to implement this ( https://github.com/elasticsearch/elasticsearch/pull/3278 ), but it’s not entirely obvious to me whether this can already be used in the current version or not .

If not, are there some solutions to this problem?

As additional information, here is my pattern matching:

curl -XPUT 'http://localhost:9200/store/user/_mapping' -d ' { "user": { "properties": { "user_id": { "type": "integer" }, "gender": { "type": "string", "index" : "not_analyzed" }, "age": { "type": "integer" }, "age_bracket": { "type": "string", "index" : "not_analyzed" }, "current_city": { "type": "string", "index" : "not_analyzed" }, "relationship_status": { "type": "string", "index" : "not_analyzed" }, "transactions" : { "type": "nested", "properties" : { "t_id": { "type": "integer" }, "t_oid": { "type": "string", "index" : "not_analyzed" }, "t_name": { "type": "string", "index" : "not_analyzed" }, "tt_id": { "type": "integer" }, "tt_name": { "type": "string", "index" : "not_analyzed" }, } } } } }'

So, for my actual desired result for my Use Case example, I would have the following:

My filtered user base will have this filter: "gender": "male" and "relationship_status": "single". To do this, I want to get various types of transactions (field "tt_name" of the attached document) and count the number of different user_codes.
Next, I want to query my complete user base (there is no filter other than a list of transaction types from 1.) and count the number of individual user identifiers
Perform proximity calculations

+6

join filter subquery elasticsearch term

Tobi Feb 17 '14 at 15:33

source share

2 answers

There is a new aggregation type significant_terms in the current version of ElasticSerach that can be used to more easily calculate affinity estimates for my use case.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_significant_terms_demo.html#_recommending_based_on_statistics

All metrics corresponding to me can be calculated in one step, which is very nice!

+1

Tobi Feb 19 '15 at 8:11

source share

Ben at Qbox.io · Accepted Answer · 2014-02-26T19:43:03+0000

Here's a link to a runnable example:

http://sense.qbox.io/gist/9da6a30fc12c36f90ae39111a08df283b56ec03c

It involves documents that look like this:

 { "transaction_type" : "some_transaction", "user_base" : "some_user_base_id" }

The query is configured so as not to return results, since aggregates take care of calculating the statistics you are looking for:

 { "size" : 0, "query" : { "match_all" : {} }, "aggs" : { "distinct_transactions" : { "terms" : { "field" : "transaction_type", "size" : 20 }, "aggs" : { "by_user_base" : { "terms" : { "field" : "user_base", "size" : 20 } } } } } }

And here is the result:

  "aggregations": { "distinct_transactions": { "buckets": [ { "key": "subscribe", "doc_count": 4, "by_user_base": { "buckets": [ { "key": "2", "doc_count": 3 }, { "key": "1", "doc_count": 1 } ] } }, { "key": "purchase", "doc_count": 3, "by_user_base": { "buckets": [ { "key": "1", "doc_count": 2 }, { "key": "2", "doc_count": 1 } ] } } ] } }

So, inside the "aggregations" you will have a list of "distinct_transactions". The key will be the type of transaction, and doc_count will represent common transactions for all users.

Within each distinct_transaction, there is a by_user_base, which is another agg expression (nested). Like transactions, the key will be the base username (or identifier or something else), and doc_count will represent this unique # transaction base.

Is that what you wanted to do? Hope I helped.

ElasticSearch Join Filter: Use subquery results as filter input?

More articles: