Effective time series query in Postgres

Question

Effective time series query in Postgres

I have a table in my PG db that looks something like this:

id | widget_id | for_date | score |

Each linked widget contains many of these elements. It is always 1 time per day per widget, but there are gaps.

What I want to get is a result that contains all the widgets for each date from X. Dates are entered through a series of generation:

  SELECT date.date::date FROM generate_series('2012-01-01'::timestamp with time zone,'now'::text::date::timestamp with time zone, '1 day') date(date) ORDER BY date.date DESC;

If there is no record for the date for this widget_id, I want to use the previous one. So they say that widget 1337 does not have an entry in 2012-05-10, but in 2012-05-08, then I want the result set to also display the entry 2012-05-08 in 2012-05-10:

 Actual data: widget_id | for_date | score 1312 | 2012-05-07 | 20 1337 | 2012-05-07 | 12 1337 | 2012-05-08 | 41 1337 | 2012-05-11 | 500 Desired output based on generate series: widget_id | for_date | score 1336 | 2012-05-07 | 20 1337 | 2012-05-07 | 12 1336 | 2012-05-08 | 20 1337 | 2012-05-08 | 41 1336 | 2012-05-09 | 20 1337 | 2012-05-09 | 41 1336 | 2012-05-10 | 20 1337 | 2012-05-10 | 41 1336 | 2012-05-11 | 20 1337 | 2012-05-11 | 500

In the end, I want to weld this into a view, so I have consistent datasets per day that I can easily query.

Edit: Improved presentation of sample data and expected results

+8

sql postgresql

TheDeadSerious Feb 14 '13 at 12:39

source share

4 answers

First of all, you can have a much simpler expression on the generate_series() table. Equivalent to yours (with the exception of the descending order, which in any case contradicts the rest of your question):

 SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date

The date type is forced to timestamptz automatically at the input. The return type is timestamptz anyway. I use the subquery below, so I can send it to date right away.

Next, max() , since the window function returns exactly what you need: the highest value, since the beginning of the frame ignores NULL values. Based on this, you get a radically simple request.

For this widget_id

Most likely faster than with CROSS JOIN or WITH RECURSIVE :

 SELECT a.day, s.* FROM ( SELECT d.day ,max(s.for_date) OVER (ORDER BY d.day) AS effective_date FROM ( SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date ) d(day) LEFT JOIN score s ON s.for_date = d.day AND s.widget_id = 1337 -- "for a given widget_id" ) a LEFT JOIN score s ON s.for_date = a.effective_date AND s.widget_id = 1337 ORDER BY a.day;

→ sqlfiddle

With this query, you can put any column from score in the final SELECT list. Put s. * For simplicity. Select columns.

If you want to start your withdrawal from the first day that actually has an account, simply replace the last LEFT JOIN with JOIN .

General form for everyone

widget_id>

Here I use CROSS JOIN to create a row for each widget on each date.

 SELECT a.day, a.widget_id, s.score FROM ( SELECT d.day, w.widget_id ,max(s.for_date) OVER (PARTITION BY w.widget_id ORDER BY d.day) AS effective_date FROM (SELECT generate_series('2012-05-05'::date ,'2012-05-15'::date, '1d')::date AS day) d CROSS JOIN (SELECT DISTINCT widget_id FROM score) AS w LEFT JOIN score s ON s.for_date = d.day AND s.widget_id = w.widget_id ) a JOIN score s ON s.for_date = a.effective_date AND s.widget_id = a.widget_id -- instead of LEFT JOIN ORDER BY a.day, a.widget_id;

→ sqlfiddle

+7

Erwin brandstetter Feb 14 '13 at 14:51

source share

Using the table structure, I created the following recursive CTE that starts with MIN (For_Date) and increments until MAX (For_Date) is reached. Not sure if there is a more efficient way, but this works well:

 WITH RECURSIVE nodes_cte(widgetid, for_date, score) AS ( -- First Widget Using Min Date SELECT w.widgetId, w.for_date, w.score FROM widgets w INNER JOIN ( SELECT widgetId, Min(for_date) min_for_date FROM widgets GROUP BY widgetId ) minW ON w.widgetId = minW.widgetid AND w.for_date = minW.min_for_date UNION ALL SELECT n.widgetId, n.for_date + 1 for_date, coalesce(w.score,n.score) score FROM nodes_cte n INNER JOIN ( SELECT widgetId, Max(for_date) max_for_date FROM widgets GROUP BY widgetId ) maxW ON n.widgetId = maxW.widgetId LEFT JOIN widgets w ON n.widgetid = w.widgetid AND n.for_date + 1 = w.for_date WHERE n.for_date + 1 <= maxW.max_for_date ) SELECT * FROM nodes_cte ORDER BY for_date

Here is the SQL Fiddle .

And the returned results (format the desired date):

 WIDGETID FOR_DATE SCORE 1337 May, 07 2012 00:00:00+0000 12 1337 May, 08 2012 00:00:00+0000 41 1337 May, 09 2012 00:00:00+0000 41 1337 May, 10 2012 00:00:00+0000 41 1337 May, 11 2012 00:00:00+0000 500

Please note: this assumes that the For_Date field is a date — if it includes time — then you may need to use the “1 day” interval in the request above.

Hope this helps.

+2

sgeddes Feb 14 '13 at 13:23

source share

Data:

 DROP SCHEMA tmp CASCADE; CREATE SCHEMA tmp ; SET search_path=tmp; CREATE TABLE widget ( widget_id INTEGER NOT NULL , for_date DATE NOT NULL , score INTEGER , PRIMARY KEY (widget_id,for_date) ); INSERT INTO widget(widget_id , for_date , score) VALUES (1312, '2012-05-07', 20) , (1337, '2012-05-07', 12) , (1337, '2012-05-08', 41) , (1337, '2012-05-11', 500) ;

Request:

 SELECT w.widget_id AS widget_id , cal::date AS for_date -- , w.for_date AS org_date , w.score AS score FROM generate_series( '2012-05-07'::timestamp , '2012-05-11'::timestamp , '1day'::interval) AS cal -- "half cartesian" Join; -- will be restricted by the NOT EXISTS() below LEFT JOIN widget w ON w.for_date <= cal WHERE NOT EXISTS ( SELECT * FROM widget nx WHERE nx.widget_id = w.widget_id AND nx.for_date <= cal AND nx.for_date > w.for_date ) ORDER BY cal, w.widget_id ;

Result:

  widget_id | for_date | score -----------+------------+------- 1312 | 2012-05-07 | 20 1337 | 2012-05-07 | 12 1312 | 2012-05-08 | 20 1337 | 2012-05-08 | 41 1312 | 2012-05-09 | 20 1337 | 2012-05-09 | 41 1312 | 2012-05-10 | 20 1337 | 2012-05-10 | 41 1312 | 2012-05-11 | 20 1337 | 2012-05-11 | 500 (10 rows)

0

wildplasser Feb 14 '13 at 19:42

source share

Clodoaldo neto · Accepted Answer · 2013-02-14T13:34:08+0000

SQL Fiddle

 select widget_id, for_date, case when score is not null then score else first_value(score) over (partition by widget_id, c order by for_date) end score from ( select a.widget_id, a.for_date, s.score, count(score) over(partition by a.widget_id order by a.for_date) c from ( select widget_id, gd::date for_date from ( select distinct widget_id from score ) s cross join generate_series( (select min(for_date) from score), (select max(for_date) from score), '1 day' ) g(d) ) a left join score s on a.widget_id = s.widget_id and a.for_date = s.for_date ) s order by widget_id, for_date

Effective time series query in Postgres

For this widget_id

General form for everyone

More articles: