Unique amount of aggregated terms

I want to count the various field values ​​from my dataset. For instance:

Aggregation of terms gives me the number of entries on username . I only want to consider the names unique , not all.

Here is my request:

 POST appzz/messages/_search { "aggs": { "words": { "terms": { "field": "username" } } }, "size": 0, "from": 0 } 

Is there a unique parameter or something like that?

+6
source share
4 answers

You are looking for the power aggregation that was added in Elasticsearch 1.1. It allows you to query for something like this:

 { "aggs" : { "unique_users" : { "cardinality" : { "field" : "username" } } } } 
+7
source

We discussed this for a long time with one of the ES guys in the recent Elasticsearch meeting we had here. The short answer is no, no. And, according to him, this is not expected soon.

One way to do this is to get all the conditions (give a really large size) and calculate how many terms will be returned, but it is expensive and not very important if you have many unique terms.

+2
source

@DerMiggel: I tried to use power for my project. Surprisingly in my local system from a total dump of about 2,00,000 documents, I tried the power with accuracy_thresholds of 100, 0 and 40,000 (as the maximum value). The first two times the result was different (175 and 184, respectively) and 40,000 exceptions from memory. In addition, the computation time was huge compared to other aggs. Therefore, I feel that the power is actually not right and can lead to the collapse of your system when high precision and accuracy are required.

+1
source

I'm still pretty new to ES, but if I get you right, it seems like you should get an answer by simply counting the number of buckets returned in response? (see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html )

NOTE though: contrary to what this document says right now with a size of 0 ("You can not limit the number of terms returned by setting the size to 0."), my testing with the latest version (1.0.1 now) shows that this does not work ! Conversely, setting the size to 0 will give you 0 buckets !!! You should set (sigh) the size to some arbitrary high rate, and now if you want to get all the conditions.

EDIT : screaming, my bad! I just re-read the document again and just noticed that there is a version there, and realized that this only comes out in 1.1.0? This note is in the past tense ("Added in 1.1.0."), Which is confusing, but I think 1.1.0 has not yet been released ....

Oh, and by the way, there seems to be something wrong with your url? I hope you know that.

0
source

All Articles