Although I would prefer that the official implementation for timestamp sizes be supported in the druid, I found the dirty hack I was looking for.
DataSource Schema
First of all, I wanted to know how many users were registered for each day, having the ability to aggregate into groups by date / month / year
here is the data scheme used:
"dataSchema": { "dataSource": "ds1", "parser": { "parseSpec": { "format": "json", "timestampSpec": { "column": "timestamp", "format": "iso" }, "dimensionsSpec": { "dimensions": [ "user_id", "platform", "register_time" ], "dimensionExclusions": [], "spatialDimensions": [] } } }, "metricsSpec": [ { "type" : "hyperUnique", "name" : "users", "fieldName" : "user_id" } ], "granularitySpec": { "type": "uniform", "segmentGranularity": "HOUR", "queryGranularity": "DAY", "intervals": ["2015-01-01/2017-01-01"] } },
therefore, the data samples should look something like this (each record is an input event):
{"user_id": 4151948, "platform": "portal", "register_time": "2016-05-29T00:45:36.000Z", "timestamp": "2016-06-29T22:18:11.000Z"} {"user_id": 2871923, "platform": "portal", "register_time": "2014-05-24T10:28:57.000Z", "timestamp": "2016-06-29T22:18:25.000Z"}
as you can see, my "main" timestamp for which I calculate these indicators is the timestamp field, where register_time is just a measurement in stringy - ISO 8601 UTC format .
Aggregation
And now, for the fun part: I managed to aggregate by timestamp (date) and register_time (date again) thanks to the time extraction function
The request is as follows:
{ "intervals": "2016-01-20/2016-07-01", "dimensions": [ { "type": "extraction", "dimension": "register_time", "outputName": "reg_date", "extractionFn": { "type": "timeFormat", "format": "YYYY-MM-dd", "timeZone": "Europe/Bratislava" , "locale": "sk-SK" } } ], "granularity": {"timeZone": "Europe/Bratislava", "period": "P1D", "type": "period"}, "aggregations": [{"fieldName": "users", "name": "users", "type": "hyperUnique"}], "dataSource": "ds1", "queryType": "groupBy" }
Filtration
The filtering solution is based on a JavaScript retrieval function with which I can convert the date to UNIX and use it inside a (for example) related filter :
{ "intervals": "2016-01-20/2016-07-01", "dimensions": [ "platform", { "type": "extraction", "dimension": "register_time", "outputName": "reg_date", "extractionFn": { "type": "javascript", "function": "function(x) {return Date.parse(x)/1000}" } } ], "granularity": {"timeZone": "Europe/Bratislava", "period": "P1D", "type": "period"}, "aggregations": [{"fieldName": "users", "name": "users", "type": "hyperUnique"}], "dataSource": "ds1", "queryType": "groupBy" "filter": { "type": "bound", "dimension": "register_time", "outputName": "reg_date", "alphaNumeric": "true" "extractionFn": { "type": "javascript", "function": "function(x) {return Date.parse(x)/1000}" } } }
I tried to filter it "directly" using a javascript filter, but I was not able to convince the druid to return the correct entries, although I double-check it for various JavaScript REPL requests, but hey, I'm not a JavaScript expert.