ElasticSearch Join Filter: Use subquery results as filter input?

I have a use case where I want to use ElasticSearch for real-time analysis. Inside this, I want to be able to calculate some simple affinity estimates.

Currently, they are determined using the number of transactions that the user base performs with a filter by criterion, compared with the full user base.

In my opinion, I will need to do the following:

  • Get individual transactions of my filtered user base
  • The request for these transactions (types) in the full user base
  • Make a calculation (rationing, etc.)

To get “separate transactions” for a filtered user base, I am currently using a cut filter condition filter query that returns all terms (transaction types). As far as I understand, I need to use this result as the input of a condition filter request for the second step to get the result that I want.

I read that there is a transfer request on GitHub that seems to implement this ( https://github.com/elasticsearch/elasticsearch/pull/3278 ), but it’s not entirely obvious to me whether this can already be used in the current version or not .

If not, are there some solutions to this problem?

As additional information, here is my pattern matching:

curl -XPUT 'http://localhost:9200/store/user/_mapping' -d ' { "user": { "properties": { "user_id": { "type": "integer" }, "gender": { "type": "string", "index" : "not_analyzed" }, "age": { "type": "integer" }, "age_bracket": { "type": "string", "index" : "not_analyzed" }, "current_city": { "type": "string", "index" : "not_analyzed" }, "relationship_status": { "type": "string", "index" : "not_analyzed" }, "transactions" : { "type": "nested", "properties" : { "t_id": { "type": "integer" }, "t_oid": { "type": "string", "index" : "not_analyzed" }, "t_name": { "type": "string", "index" : "not_analyzed" }, "tt_id": { "type": "integer" }, "tt_name": { "type": "string", "index" : "not_analyzed" }, } } } } }' 

So, for my actual desired result for my Use Case example, I would have the following:

  • My filtered user base will have this filter: "gender": "male" and "relationship_status": "single". To do this, I want to get various types of transactions (field "tt_name" of the attached document) and count the number of different user_codes.
  • Next, I want to query my complete user base (there is no filter other than a list of transaction types from 1.) and count the number of individual user identifiers
  • Perform proximity calculations
+6
source share
2 answers

Here's a link to a runnable example:

http://sense.qbox.io/gist/9da6a30fc12c36f90ae39111a08df283b56ec03c

It involves documents that look like this:

 { "transaction_type" : "some_transaction", "user_base" : "some_user_base_id" } 

The query is configured so as not to return results, since aggregates take care of calculating the statistics you are looking for:

 { "size" : 0, "query" : { "match_all" : {} }, "aggs" : { "distinct_transactions" : { "terms" : { "field" : "transaction_type", "size" : 20 }, "aggs" : { "by_user_base" : { "terms" : { "field" : "user_base", "size" : 20 } } } } } } 

And here is the result:

  "aggregations": { "distinct_transactions": { "buckets": [ { "key": "subscribe", "doc_count": 4, "by_user_base": { "buckets": [ { "key": "2", "doc_count": 3 }, { "key": "1", "doc_count": 1 } ] } }, { "key": "purchase", "doc_count": 3, "by_user_base": { "buckets": [ { "key": "1", "doc_count": 2 }, { "key": "2", "doc_count": 1 } ] } } ] } } 

So, inside the "aggregations" you will have a list of "distinct_transactions". The key will be the type of transaction, and doc_count will represent common transactions for all users.

Within each distinct_transaction, there is a by_user_base, which is another agg expression (nested). Like transactions, the key will be the base username (or identifier or something else), and doc_count will represent this unique # transaction base.

Is that what you wanted to do? Hope I helped.

+6
source

There is a new aggregation type significant_terms in the current version of ElasticSerach that can be used to more easily calculate affinity estimates for my use case.

All metrics corresponding to me can be calculated in one step, which is very nice!

+1
source

All Articles