Optimal way to create a histogram / frequency distribution in Oracle?

I have an events table with two eventkey columns (a unique, primary key) and createtime that stores the time the event was generated as the number of milliseconds since January 1, 1970 in the NUMBER column.

I would like to create a “histogram” or frequency distribution that will show me how many events were generated for each hour of the past week.

Is this the best way to write such a query in Oracle using width_bucket() ? Is it possible to deduce the number of rows that fall into each bucket using one of the other Oracle analytic functions, and not use width_bucket to determine how many buckets each row belongs to, and do count(*) on it?

 -- 1305504000000 = 5/16/2011 12:00am GMT -- 1306108800000 = 5/23/2011 12:00am GMT select timestamp '1970-01-01 00:00:00' + numtodsinterval((1305504000000/1000 + (bucket * 60 * 60)), 'second') period_start, numevents from ( select bucket, count(*) as events from ( select eventkey, createtime, width_bucket(createtime, 1305504000000, 1306108800000, 24 * 7) bucket from events where createtime between 1305504000000 and 1306108800000 ) group by bucket ) order by period_start 
+7
source share
3 answers

If your createtime was a date column, this would be trivial:

 SELECT TO_CHAR(CREATE_TIME, 'DAY:HH24'), COUNT(*) FROM EVENTS GROUP BY TO_CHAR(CREATE_TIME, 'DAY:HH24'); 

Be that as it may, casting the createtime column createtime not too complicated:

 select TO_CHAR( TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000), 'DAY:HH24') AS BUCKET, COUNT(*) FROM EVENTS WHERE createtime between 1305504000000 and 1306108800000 group by TO_CHAR( TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000), 'DAY:HH24') order by 1 

If, as an alternative, you are looking for fencepost values ​​(for example, where can I go from the first decile (0-10%) to the next (11-20%), you would do something like:

 select min(createtime) over (partition by decile) as decile_start, max(createtime) over (partition by decile) as decile_end, decile from (select createtime, ntile (10) over (order by createtime asc) as decile from events where createtime between 1305504000000 and 1306108800000 ) 
+10
source

I am not familiar with Oracle date functions, but I am pretty sure that the equivalent way to write this Postgres statement is:

 select date_trunc('hour', stamp), count(*) from your_data group by date_trunc('hour', stamp) order by date_trunc('hour', stamp) 
+3
source

The answer is pretty much the same as Adam's, but I would prefer to save period_start as a time field, so it’s easier to filter if necessary:

 with events as ( select rownum eventkey, round(dbms_random.value(1305504000000, 1306108800000)) createtime from dual connect by level <= 1000 ) select trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH') period_start, count(*) numevents from events where createtime between 1305504000000 and 1306108800000 group by trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH') order by period_start 
+1
source

All Articles