For some of my queries to ElasticSearch, I want to return three pieces of information:
- What T terms occurred in the result set of documents?
- How often does each element of T occur in the resulting document?
- How often does each element of T occur in the entire index (-> document frequency)?
The first points are easily determined using the default boundary term or, currently, the aggregation method. Therefore, my question is really about the third. Prior to ElasticSearch 1.x, i.e. Before moving on to the “aggregation” paradigm, I could use the term “facet” with the “global” parameter set to trueand QueryFilterto get the frequency of the document (“global counts”) of the exact terms included in the set of documents specified QueryFilter. At first I thought I could do the same using global aggregation, but it seems like I can't. The reason is if I understand correctly that the original mechanism facetwas centered around the terms, while the aggregating buckets are determined by the set of documents that belong to each bucket. That is, specifying the option global term facetwithQueryFilter, first defined the terms typed by the filter, and then the calculated facet values. Since the facet was global, I would get the number of documents.
With aggregations, it is different. Aggregation globalcan only be used as top aggregation, forcing aggregation to ignore the current query results and calculate aggregation - for example, a terms aggregation- over all documents in the index. Therefore, for me this is too much, since I WANT to limit the returned terms ("buckets") to the terms in the result set of the document. But if I use sub-aggregation of the filter with sub-aggregation of terms, I would again limit the term bucket to a filter, thus not getting the frequency of the document, but counting the normal graphs. The reason is that the buckets are defined after the filter, so they are "too small." But I do not want to limit the size of the bucket, I want to limit the buckets to terms in the query result set.
( )?
!
EDIT. , .
:
- global_agg_with_filter_and_terms
- global_agg_with_terms_and_filter
global , . , , , --.
, . , , .
{
"query": {
"query_string": {
"query": "text: my query string"
}
},
"aggs": {
"global_agg_with_filter_and_terms": {
"global": {},
"aggs": {
"filter_agg": {
"filter": {
"query": {
"query_string": {
"query": "text: my query string"
}
}
},
"aggs": {
"terms_agg": {
"terms": {
"field": "facets"
}
}
}
}
}
},
"global_agg_with_terms_and_filter": {
"global": {},
"aggs": {
"document_frequency": {
"terms": {
"field": "facets"
},
"aggs": {
"term_count": {
"filter": {
"query": {
"query_string": {
"query": "text: my query string"
}
}
}
}
}
}
}
}
}
}
:
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 221,
"max_score": 0.9839197,
"hits": <omitted>
},
"aggregations": {
"global_agg_with_filter_and_terms": {
"doc_count": 1978,
"filter_agg": {
"doc_count": 221,
"terms_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "fid8",
"doc_count": 155
},
{
"key": "fid6",
"doc_count": 40
},
{
"key": "fid9",
"doc_count": 10
},
{
"key": "fid5",
"doc_count": 9
},
{
"key": "fid13",
"doc_count": 5
},
{
"key": "fid7",
"doc_count": 2
}
]
}
}
},
"global_agg_with_terms_and_filter": {
"doc_count": 1978,
"document_frequency": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "fid8",
"doc_count": 1050,
"term_count": {
"doc_count": 155
}
},
{
"key": "fid6",
"doc_count": 668,
"term_count": {
"doc_count": 40
}
},
{
"key": "fid9",
"doc_count": 67,
"term_count": {
"doc_count": 10
}
},
{
"key": "fid5",
"doc_count": 65,
"term_count": {
"doc_count": 9
}
},
{
"key": "fid7",
"doc_count": 63,
"term_count": {
"doc_count": 2
}
},
{
"key": "fid13",
"doc_count": 55,
"term_count": {
"doc_count": 5
}
},
{
"key": "fid10",
"doc_count": 11,
"term_count": {
"doc_count": 0
}
},
{
"key": "fid11",
"doc_count": 9,
"term_count": {
"doc_count": 0
}
},
{
"key": "fid12",
"doc_count": 5,
"term_count": {
"doc_count": 0
}
}
]
}
}
}
}
, , , fid8 fid6. , 155 40 , . , global_agg_with_terms_and_filter. - , 1050 668 . . , , fid10 fid12. , , term_count 0. , , - . , , ( !) . , , , .. , global_agg_with_filter_and_terms.
, - , , term_count doc_count?