This question is related to this . I have a table that contains the power values ββfor the devices, and I need to calculate the energy consumption for a given time interval and return the 10 most energy-consuming devices. I created 192 devices and 7742208 measurement records (40324 for each). This is approximately how many records will be made in one month.
For this amount of data, my current request takes more than 40 seconds to complete, which is too much because the time interval and the number of devices and measurements can be much higher. Should I try to solve this with a different approach than lag () OVER PARTITION and what other optimizations can be done? I would really appreciate suggestions with code examples.
PostgreSQL version 9.4
A query with sample values:
SELECT t.device_id, sum(len_y*(extract(epoch from len_x))) AS total_consumption FROM ( SELECT m.id, m.device_id, m.power_total, m.created_at, m.power_total+lag(m.power_total) OVER ( PARTITION BY device_id ORDER BY m.created_at ) AS len_y, m.created_at-lag(m.created_at) OVER ( PARTITION BY device_id ORDER BY m.created_at ) AS len_x FROM measurements AS m WHERE m.created_at BETWEEN '2015-07-30 13:05:24.403552+00'::timestamp AND '2015-08-27 12:34:59.826837+00'::timestamp ) AS t GROUP BY t.device_id ORDER BY total_consumption DESC LIMIT 10;
Information about the table:
Column | Type | Modifiers --------------+--------------------------+---------------------------------------------------------- id | integer | not null default nextval('measurements_id_seq'::regclass) created_at | timestamp with time zone | default timezone('utc'::text, now()) power_total | real | device_id | integer | not null Indexes: "measurements_pkey" PRIMARY KEY, btree (id) "measurements_device_id_idx" btree (device_id) "measurements_created_at_idx" btree (created_at) Foreign-key constraints: "measurements_device_id_fkey" FOREIGN KEY (device_id) REFERENCES devices(id)
Request Plan:
Limit (cost=1317403.25..1317403.27 rows=10 width=24) (actual time=41077.091..41077.094 rows=10 loops=1) -> Sort (cost=1317403.25..1317403.73 rows=192 width=24) (actual time=41077.089..41077.092 rows=10 loops=1) Sort Key: (sum((((m.power_total + lag(m.power_total) OVER (?))) * date_part('epoch'::text, ((m.created_at - lag(m.created_at) OVER (?))))))) Sort Method: top-N heapsort Memory: 25kB -> GroupAggregate (cost=1041700.67..1317399.10 rows=192 width=24) (actual time=25361.013..41076.562 rows=192 loops=1) Group Key: m.device_id -> WindowAgg (cost=1041700.67..1201314.44 rows=5804137 width=20) (actual time=25291.797..37839.727 rows=7742208 loops=1) -> Sort (cost=1041700.67..1056211.02 rows=5804137 width=20) (actual time=25291.746..30699.993 rows=7742208 loops=1) Sort Key: m.device_id, m.created_at Sort Method: external merge Disk: 257344kB -> Seq Scan on measurements m (cost=0.00..151582.05 rows=5804137 width=20) (actual time=0.333..5112.851 rows=7742208 loops=1) Filter: ((created_at >= '2015-07-30 13:05:24.403552'::timestamp without time zone) AND (created_at <= '2015-08-27 12:34:59.826837'::timestamp without time zone)) Planning time: 0.351 ms Execution time: 41114.883 ms
A query to create a test pattern and data:
CREATE TABLE measurements ( id serial primary key, device_id integer, power_total real, created_at timestamp ); INSERT INTO measurements( device_id, created_at, power_total ) SELECT device_id, now() + (i * interval '1 minute'), random()*(50-1)+1 FROM ( SELECT DISTINCT(device_id), generate_series(0,10) AS i FROM ( SELECT generate_series(1,5) AS device_id ) AS dev_ids ) AS gen_table;