I have a use case where I want to use ElasticSearch for real-time analysis. Inside this, I want to be able to calculate some simple affinity estimates.
Currently, they are determined using the number of transactions that the user base performs with a filter by criterion, compared with the full user base.
In my opinion, I will need to do the following:
- Get individual transactions of my filtered user base
- The request for these transactions (types) in the full user base
- Make a calculation (rationing, etc.)
To get “separate transactions” for a filtered user base, I am currently using a cut filter condition filter query that returns all terms (transaction types). As far as I understand, I need to use this result as the input of a condition filter request for the second step to get the result that I want.
I read that there is a transfer request on GitHub that seems to implement this ( https://github.com/elasticsearch/elasticsearch/pull/3278 ), but it’s not entirely obvious to me whether this can already be used in the current version or not .
If not, are there some solutions to this problem?
As additional information, here is my pattern matching:
curl -XPUT 'http://localhost:9200/store/user/_mapping' -d ' { "user": { "properties": { "user_id": { "type": "integer" }, "gender": { "type": "string", "index" : "not_analyzed" }, "age": { "type": "integer" }, "age_bracket": { "type": "string", "index" : "not_analyzed" }, "current_city": { "type": "string", "index" : "not_analyzed" }, "relationship_status": { "type": "string", "index" : "not_analyzed" }, "transactions" : { "type": "nested", "properties" : { "t_id": { "type": "integer" }, "t_oid": { "type": "string", "index" : "not_analyzed" }, "t_name": { "type": "string", "index" : "not_analyzed" }, "tt_id": { "type": "integer" }, "tt_name": { "type": "string", "index" : "not_analyzed" }, } } } } }'
So, for my actual desired result for my Use Case example, I would have the following:
- My filtered user base will have this filter: "gender": "male" and "relationship_status": "single". To do this, I want to get various types of transactions (field "tt_name" of the attached document) and count the number of different user_codes.
- Next, I want to query my complete user base (there is no filter other than a list of transaction types from 1.) and count the number of individual user identifiers
- Perform proximity calculations