Performing date and time aggregation in SQL

Question

Performing date and time aggregation in SQL

I have a data set that contains observations for several weeks with a frequency of 2 minutes. I want to increase the time interval from 2 minutes to 5 minutes. The problem is that the frequency of observations is not always the same. I mean, theoretically, there should be 5 observations every 10 minutes, but this is usually not the case. Please let me know how I can combine observations based on average function and regarding the time and date of observations. In other words, aggregation is based on every 5 minutes, while the number of observations is not the same for every 5 minute time interval. Moreover, I have a date and time in timestamp format.

Sample data:

1 2007-09-14 22:56:12 5.39 2 2007-09-14 22:58:12 5.34 3 2007-09-14 23:00:12 5.16 4 2007-09-14 23:02:12 5.54 5 2007-09-14 23:04:12 5.30 6 2007-09-14 23:06:12 5.20

Expected results:

 1 2007-09-14 23:00 5.29 2 2007-09-14 23:05 5.34

+7

sql timestamp aggregation postgresql

A. amidi Oct 22 '12 at 9:43

source share

4 answers

EDIT: I thought about this a bit and realized that you can't just go from 2 minutes to 5 minutes. It does not add up. I will keep track of this, but the following code works as soon as you have 1 minute data for aggregation!

-

If the data is in the beginning format, you can use the code inside this function or create a function in your database to facilitate access:

 CREATE OR REPLACE FUNCTION dev.beginning_datetime_floor(timestamp without time zone, integer) /* switch out 'dev' with your schema name */ RETURNS timestamp without time zone AS $BODY$ SELECT date_trunc('minute',timestamp with time zone 'epoch' + floor(extract(epoch from $1)/($2*60))*$2*60 * interval '1 second') at time zone 'CST6CDT' /* change this to your time zone */ $BODY$ LANGUAGE sql VOLATILE;

You just download this integer number of minutes you want to copy (use 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, or 30), here are a couple of results:

 select dev.beginning_datetime_floor('2012-01-01 02:02:21',2)

= '2012-01-01 02:02:00'

 select dev.beginning_datetime_floor('2012-01-01 02:02:21',5)

= '2012-01-01 02:00:00'

Just test it and add or subtract time to process the start and end timestamps using the built-in timestamp functions .

When you get the right timestamp, do what Craig said and GROUP BY at that timestamp in combination with your desired aggregate functions (probably average).

You can check / configure it with:

 date_trunc('minute',timestamp with time zone 'epoch' + floor(extract(epoch from your_datetime)/(interval_minutes*60))*interval_minutes*60 * interval '1 second') at time zone 'CST6CDT' /* change this to your time zone */

It may turn out that you want to average the timestamps - if, for example, the length of your interval is unstable. To do this, you can make a similar function that rounds the timestamp, instead of making the floor.

+2

ideamotor Oct 22 '12 at 20:12

source share

The easiest option is to create a lookup table. In this table you save the intervals over which you were laid:

(adapt this to your own RDBMS date notation.)

 CREATE TABLE interval ( start_time DATETIME, cease_time DATETIME ); INSERT INTO interval SELECT '2012-10-22 12:00', '2012-10-22 12:05'; INSERT INTO interval SELECT '2012-10-22 12:05', '2012-10-22 12:10'; INSERT INTO interval SELECT '2012-10-22 12:10', '2012-10-22 12:15'; INSERT INTO interval SELECT '2012-10-22 12:15', '2012-10-22 12:20'; INSERT INTO interval SELECT '2012-10-22 12:20', '2012-10-22 12:25'; INSERT INTO interval SELECT '2012-10-22 12:25', '2012-10-22 12:30'; INSERT INTO interval SELECT '2012-10-22 12:30', '2012-10-22 12:35'; INSERT INTO interval SELECT '2012-10-22 12:35', '2012-10-22 12:40';

Then you just join and merge ...

 SELECT interval.start_time, AVG(observation.value) FROM interval LEFT JOIN observation ON observation.timestamp >= interval.start_time AND observation.timestamp < interval.cease_time GROUP BY interval.start_time

NOTE. You only need to create and populate the spacing table once, then you can reuse it many times.

+1

MatBailie Oct 22 '12 at 9:51

source share

Okay, so this is just one way to handle this. Hopefully this will make you think about how to convert the data for your analysis needs.

There is a prerequisite for checking this code. You need to have a table with all possible one-minute timestamps. There are many ways to do this, I just use what I have, this is one table: dim_time, which has every minute (00:01:00) until (23:59:00) and another table with all possible dates (dim_date ) When you join them (1 = 1), you will receive all possible minutes for all possible days.

 --first you need to create some functions I'll use later --credit to this first function goes to David Walling CREATE OR REPLACE FUNCTION dev.beginning_datetime_floor(timestamp without time zone, integer) RETURNS timestamp without time zone AS $BODY$ SELECT date_trunc('minute',timestamp with time zone 'epoch' + floor(extract(epoch from $1)/($2*60))*$2*60 * interval '1 second') at time zone 'CST6CDT' $BODY$ LANGUAGE sql VOLATILE; --the following function is what I described on my previous post CREATE OR REPLACE FUNCTION dev.round_minutes(timestamp without time zone, integer) RETURNS timestamp without time zone AS $BODY$ SELECT date_trunc('hour', $1) + cast(($2::varchar||' min') as interval) * round(date_part('minute',$1)::float / cast($2 as float)) $BODY$ LANGUAGE sql VOLATILE; --let load the data into a temp table, I added some data points. note: i got rid of the partial seconds SELECT cast(timestamp_original as timestamp) as timestamp_original, datapoint INTO TEMPORARY TABLE timestamps_second2 FROM ( SELECT '2007-09-14 22:56:12' as timestamp_original, 0 as datapoint UNION SELECT '2007-09-14 22:58:12' as timestamp_original, 1 as datapoint UNION SELECT '2007-09-14 23:00:12' as timestamp_original, 10 as datapoint UNION SELECT '2007-09-14 23:02:12' as timestamp_original, 100 as datapoint UNION SELECT '2007-09-14 23:04:12' as timestamp_original, 1000 as datapoint UNION SELECT '2007-09-14 23:06:12' as timestamp_original, 10000 as datapoint ) as data --this is the bit of code you'll have to replace with your implementation of getting all possible minutes --you could make some sequence of timestamps in R, or simply make the timestamps in Excel to test out the rest of the code --the result of the query is simply '2007-09-14 00:00:00' through '2007-09-14 23:59:00' SELECT * INTO TEMPORARY TABLE possible_timestamps FROM ( select the_date + beginning_minute as minute_timestamp FROM datawarehouse.dim_date as dim_date JOIN datawarehouse.dim_time as dim_time ON 1=1 where dim_date.the_date = '2007-09-14' group by the_date, beginning_minute order by the_date, beginning_minute ) as data --round to nearest minute (be sure to think about how this might change your results SELECT * INTO TEMPORARY TABLE rounded_timestamps2 FROM ( SELECT dev.round_minutes(timestamp_original,1) as minute_timestamp_rounded, datapoint from timestamps_second2 ) as data --let join what minutes we have data for versus the possible minutes --I used some subqueries so when you select all from the table you'll see the important part (not needed) SELECT * INTO TEMPORARY TABLE joined_with_possibles FROM ( SELECT * FROM ( SELECT *, (MIN(minute_timestamp_rounded) OVER ()) as min_time, (MAX(minute_timestamp_rounded) OVER ()) as max_time FROM possible_timestamps as t1 LEFT JOIN rounded_timestamps2 as t2 ON t1.minute_timestamp = t2.minute_timestamp_rounded ORDER BY t1.minute_timestamp asc ) as inner_query WHERE minute_timestamp >= min_time AND minute_timestamp <= max_time ) as data --here the tricky part that might not suit your needs, but it one method --if it missing a value it grabs the previous value --if it missing the prior value it grabs the one before that, otherwise it null --best practice would be run another case statement with 0,1,2 specifying which point was pulled, then you can count those when you aggregate SELECT * INTO TEMPORARY TABLE shifted_values FROM ( SELECT *, case when datapoint is not null then datapoint when datapoint is null and (lag(datapoint,1) over (order by minute_timestamp asc)) is not null then lag(datapoint,1) over (order by minute_timestamp asc) when datapoint is null and (lag(datapoint,1) over (order by minute_timestamp asc)) is null and (lag(datapoint,2) over (order by minute_timestamp asc)) is not null then lag(datapoint,2) over (order by minute_timestamp asc) else null end as last_good_value from joined_with_possibles ORDER BY minute_timestamp asc ) as data --now we use the function from my previous post to make the timestamps to aggregate on SELECT * INTO TEMPORARY TABLE shifted_values_with_five_minute FROM ( SELECT *, dev.beginning_datetime_floor(minute_timestamp,5) as five_minute_timestamp FROM shifted_values ) as data --finally we aggregate SELECT AVG(datapoint) as avg_datapoint, five_minute_timestamp FROM shifted_values_with_five_minute GROUP BY five_minute_timestamp

+1

ideamotor Oct 22 '12 at 21:27

source share

Craig Ringer · Accepted Answer · 2012-10-22T11:16:26+0000

The answer to this question is likely to be a good solution to your problem, showing ways to efficiently aggregate data into time windows.

Essentially use an avg aggregate with:

 GROUP BY floor(extract(epoch from the_timestamp) / 60 / 5)

Performing date and time aggregation in SQL

More articles: