How to do SELECT on results returned from GROUP BY?

I find it difficult to convert this simple SQL query below to Druid:

SELECT country, city, Count(*) FROM people_data WHERE name="Mary" GROUP BY country, city; 

So, I came up with this question:

 { "queryType": "groupBy", "dataSource" : "people_data", "granularity": "all", "metric" : "num_of_pages", "dimensions": ["country", "city"], "filter" : { "type" : "and", "fields" : [ { "type": "in", "dimension": "name", "values": ["Mary"] }, { "type" : "javascript", "dimension" : "email", "function" : "function(value) { return (value.length !== 0) }" } ] }, "aggregations": [ { "type": "longSum", "name": "num_of_pages", "fieldName": "count" } ], "intervals": [ "2016-07-20/2016-07-21" ] } 

The query above is executed, but it does not look like groupBy in the Druid data source is even evaluated, since I see people in my output with names other than Mary. Does anyone have any input on how to make this work?

+2
source share
1 answer

The simple answer is that you cannot select arbitrary dimensions in groupBy queries.

Strictly speaking, even an SQL query does not make sense. If for this combination of country, city there are many different values ​​for name and street , then how do you fit this into one line? You must fill them out, for example. using the max function.

In this case, you can include the same column in your data as the dimension and metric, for example. name_dim and name_metric , and enable the appropriate aggregation by your metric, max(name_metric) .

Please note that if these columns are name , etc. have high granularity values, this will destroy the Druid coagulation function.

+2
source

All Articles