Summarize Timeline Values in SQL

Question

Summarize Timeline Values in SQL

Problem

I have a PostgreSQL database on which I am trying to summarize the cash register revenue over time. The cash register may have the status ACTIVE or INACTIVE, but I only want to summarize the income generated when it was active for a certain period of time.

I have two tables; which marks the income and the one that marks the status of the cash register:

CREATE TABLE counters ( id bigserial NOT NULL, "timestamp" timestamp with time zone, total_revenue bigint, id_of_machine character varying(50), CONSTRAINT counters_pkey PRIMARY KEY (id) ) CREATE TABLE machine_lifecycle_events ( id bigserial NOT NULL, event_type character varying(50), "timestamp" timestamp with time zone, id_of_affected_machine character varying(50), CONSTRAINT machine_lifecycle_events_pkey PRIMARY KEY (id) )

A counter record is added every 1 minute, and total_revenue only increases. The machine_lifecycle_events entry is added every time a machine state changes.

I have added an image illustrating the problem. This is the revenue during the blue periods to be added up.

Timeline showing problem.

What I tried so far

I created a query that can give me the total revenue at the moment:

 SELECT total_revenue FROM counters WHERE timestamp < '2014-03-05 11:00:00' AND id_of_machine='1' ORDER BY timestamp desc LIMIT 1

Questions

How to calculate the income received between two timestamps?
How do I determine the start and end timestamps of blue periods when I have to compare timestamps in machine_lifecycle_events with the input period?

Any ideas on how to attack this issue?

Update

Sample data:

 INSERT INTO counters VALUES (1, '2014-03-01 00:00:00', 100, '1') , (2, '2014-03-01 12:00:00', 200, '1') , (3, '2014-03-02 00:00:00', 300, '1') , (4, '2014-03-02 12:00:00', 400, '1') , (5, '2014-03-03 00:00:00', 500, '1') , (6, '2014-03-03 12:00:00', 600, '1') , (7, '2014-03-04 00:00:00', 700, '1') , (8, '2014-03-04 12:00:00', 800, '1') , (9, '2014-03-05 00:00:00', 900, '1') , (10, '2014-03-05 12:00:00', 1000, '1') , (11, '2014-03-06 00:00:00', 1100, '1') , (12, '2014-03-06 12:00:00', 1200, '1') , (13, '2014-03-07 00:00:00', 1300, '1') , (14, '2014-03-07 12:00:00', 1400, '1'); INSERT INTO machine_lifecycle_events VALUES (1, 'ACTIVE', '2014-03-01 08:00:00', '1') , (2, 'INACTIVE', '2014-03-03 00:00:00', '1') , (3, 'ACTIVE', '2014-03-05 00:00:00', '1') , (4, 'INACTIVE', '2014-03-06 12:00:00', '1');

SQL Fiddle with sample data.

Request example:
The income between "2014-03-02 08:00:00" and "2014-03-06 08:00:00" is 300. 100 for the first ACTIVE period and 200 for the second ACTIVE period.

+7

sql aggregate-functions postgresql date-arithmetic window-functions

uldall Mar 07 '14 at 15:04

source share

3 answers

ok, I have an answer, but I had to assume that the machine_lifecycle_events identifier machine_lifecycle_events be used to determine access and predecessor. Therefore, in order for my solution to work better, you must have a connection between active and inactive events. There may be other ways to solve it, but this will add even more complexity.

Firstly, to get income for all active periods on a car, you can do the following:

 select c.id_of_machine, cycle_id, cycle_start, cycle_end, sum(total_revenue) from counters c join ( select e1.id as cycle_id, e1.timestamp as cycle_start, e2.timestamp as cycle_end, e1.id_of_affected_machine as cycle_machine_id from machine_lifecycle_events e1 join machine_lifecycle_events e2 on e1.id + 1 = e2.id and -- this should be replaced with a specific column to find cycles which belong together e1.id_of_affected_machine = e2.id_of_affected_machine where e1.event_type = 'ACTIVE' ) cycle on c.id_of_machine = cycle_machine_id and cycle_start <= c.timestamp and c.timestamp <= cycle_end group by c.id_of_machine, cycle_id, cycle_start, cycle_end order by c.id_of_machine, cycle_id

you can use this request and add additional conditions for generating income only for a period of time or for certain machines:

 select sum(total_revenue) from counters c join ( select e1.id as cycle_id, e1.timestamp as cycle_start, e2.timestamp as cycle_end, e1.id_of_affected_machine as cycle_machine_id from machine_lifecycle_events e1 join machine_lifecycle_events e2 on e1.id + 1 = e2.id and -- this should be replaced with a specific column to find cycles which belong together e1.id_of_affected_machine = e2.id_of_affected_machine where e1.event_type = 'ACTIVE' ) cycle on c.id_of_machine = cycle_machine_id and cycle_start <= c.timestamp and c.timestamp <= cycle_end where '2014-03-02 08:00:00' <= c.timestamp and c.timestamp <= '2014-03-06 08:00:00' and c.id_of_machine = '1'

As mentioned at the beginning and in the comments, my way to find connection events is not suitable for more complex multi-machine examples. The easiest way is to have another column that would always point to the previous event. Another way would be to have a function that finds these events, but this solution could not use indexes.

0

peter Mar 07 '14 at 16:40

source share

Use a merge and assembly table with the actual status of each interval.

 with intervals as ( select e1.timestamp time1, e2.timestamp time2, e1.EVENT_TYPE as status from machine_lifecycle_events e1 left join machine_lifecycle_events e2 on e2.id = e1.id + 1 ) select * from counters c join intervals i on (timestamp between i.time1 and i.time2 or i.time2 is null) and i.status = 'ACTIVE';

I have not used aggregation to display a set of results, you can do it simply, I think. I also skipped machineId to simplify the demonstration of this template.

0

zealot Mar 07 '14 at 16:53

source share

Erwin brandstetter · Accepted Answer · 2014-03-08T00:12:50+0000

DB design

To facilitate my work, I sanitized your database design before resolving issues:

 CREATE TEMP TABLE counter ( id bigserial PRIMARY KEY , ts timestamp NOT NULL , total_revenue bigint NOT NULL , machine_id int NOT NULL ); CREATE TEMP TABLE machine_event ( id bigserial PRIMARY KEY , ts timestamp NOT NULL , machine_id int NOT NULL , status_active bool NOT NULL );

Test script in the script.

Basic moments

Using ts instead of "timestamp". Never use base type names as column names.
The simplified and unified name machine_id made it integer as it should, instead of varchar(50) .
event_type varchar(50) must be an external t integer or enum . Or even just boolean only for active / inactive. Simplified to status_active bool .
Simplified and sanitized INSERT instructions.

The answers

Assumptions

total_revenue only increases (per question).
The boundaries of the external time interval are included.
Each "next" line to a machine in machine_event has the reverse side of status_active .

1. How to calculate the income received between two timestamps?

 WITH span AS ( SELECT '2014-03-02 12:00'::timestamp AS s_from -- start of time range , '2014-03-05 11:00'::timestamp AS s_to -- end of time range ) SELECT machine_id, s.s_from, s.s_to , max(total_revenue) - min(total_revenue) AS earned FROM counter c , span s WHERE ts BETWEEN s_from AND s_to -- borders included! AND machine_id = 1 GROUP BY 1,2,3;

2. How to determine the timestamps of the beginning and end of blue periods when I need to compare the timestamps in machine_event with the input period?

This request is for all machines for a given time period ( span ).
Add WHERE machine_id = 1 to the CTE cte to select a specific machine.

 WITH span AS ( SELECT '2014-03-02 08:00'::timestamp AS s_from -- start of time range , '2014-03-06 08:00'::timestamp AS s_to -- end of time range ) , cte AS ( SELECT machine_id, ts, status_active, s_from , lead(ts, 1, s_to) OVER w AS period_end , first_value(ts) OVER w AS first_ts FROM span s JOIN machine_event e ON e.ts BETWEEN s.s_from AND s.s_to WINDOW w AS (PARTITION BY machine_id ORDER BY ts) ) SELECT machine_id, ts AS period_start, period_end -- start in time frame FROM cte WHERE status_active UNION ALL -- active start before time frame SELECT machine_id, s_from, ts FROM cte WHERE NOT status_active AND ts = first_ts AND ts <> s_from UNION ALL -- active start before time frame, no end in time frame SELECT machine_id, s_from, s_to FROM ( SELECT DISTINCT ON (1) e.machine_id, e.status_active, s.s_from, s.s_to FROM span s JOIN machine_event e ON e.ts < s.s_from -- only from before time range LEFT JOIN cte c USING (machine_id) WHERE c.machine_id IS NULL -- not in selected time range ORDER BY e.machine_id, e.ts DESC -- only the latest entry ) sub WHERE status_active -- only if active ORDER BY 1, 2;

The result is a list of blue periods in your image.
SQL Fiddle demonstrates both.

A recent similar question:
Sum of time difference between rows

Summarize Timeline Values ​​in SQL

Problem

What I tried so far

Questions

Update

DB design

Basic moments

The answers

Assumptions

More articles:

Summarize Timeline Values in SQL