Postgres window (definition of contiguous days)

Using Postgres 9.3, I am trying to calculate the number of contiguous days of a certain type of weather. Assuming we have regular time series and weather forecast:

date|weather "2016-02-01";"Sunny" "2016-02-02";"Cloudy" "2016-02-03";"Snow" "2016-02-04";"Snow" "2016-02-05";"Cloudy" "2016-02-06";"Sunny" "2016-02-07";"Sunny" "2016-02-08";"Sunny" "2016-02-09";"Snow" "2016-02-10";"Snow" 

I want something to count the adjacent days of the same weather. The results should look something like this:

 date|weather|contiguous_days "2016-02-01";"Sunny";1 "2016-02-02";"Cloudy";1 "2016-02-03";"Snow";1 "2016-02-04";"Snow";2 "2016-02-05";"Cloudy";1 "2016-02-06";"Sunny";1 "2016-02-07";"Sunny";2 "2016-02-08";"Sunny";3 "2016-02-09";"Snow";1 "2016-02-10";"Snow";2 

Several times I banged my head about this, trying to use the functions of the windows. At first it seems that it should be no problem, but then I found out that it is much more complicated than expected.

Here is what I tried ...

 Select date, weather, Row_Number() Over (partition by weather order by date) from t_weather 

Would it be better to simply compare the current line with the next? How would you do this, having saved the score? Any thoughts, ideas, or even solutions would be helpful! -Kip

+6
source share
4 answers

You need to identify the adjacent where the weather is the same. You can do this by adding a grouping identifier. There is a simple method: subtract a sequence of increasing numbers from dates and it is constant for adjacent dates.

You have a grouping, the rest are row_number() :

 Select date, weather, Row_Number() Over (partition by weather, grp order by date) from (select w.*, (date - row_number() over (partition by weather order by date) * interval '1 day') as grp from t_weather w ) w; 

SQL script here .

+2
source

I'm not sure what the query mechanism will do when scanning multiple times over the same data set (sort of like a computational area under a curve), but it works ...

 WITH v(date, weather) AS ( VALUES ('2016-02-01'::date,'Sunny'::text), ('2016-02-02','Cloudy'), ('2016-02-03','Snow'), ('2016-02-04','Snow'), ('2016-02-05','Cloudy'), ('2016-02-06','Sunny'), ('2016-02-07','Sunny'), ('2016-02-08','Sunny'), ('2016-02-09','Snow'), ('2016-02-10','Snow') ), changes AS ( SELECT date, weather, CASE WHEN lag(weather) OVER () = weather THEN 1 ELSE 0 END change FROM v) SELECT date , weather ,(SELECT count(weather) -- number of times the weather didn't change FROM changes v2 WHERE v2.date <= v1.date AND v2.weather = v1.weather AND v2.date >= ( -- bounded between changes of weather SELECT max(date) FROM changes v3 WHERE change = 0 AND v3.weather = v1.weather AND v3.date <= v1.date) --<-- here the expensive part ) curve FROM changes v1 
+2
source

You can accomplish this with a recursive CTE as follows:

 WITH RECURSIVE CTE_ConsecutiveDays AS ( SELECT my_date, weather, 1 AS consecutive_days FROM My_Table T WHERE NOT EXISTS (SELECT * FROM My_Table T2 WHERE T2.my_date = T.my_date - INTERVAL '1 day' AND T2.weather = T.weather) UNION ALL SELECT T.my_date, T.weather, CD.consecutive_days + 1 FROM CTE_ConsecutiveDays CD INNER JOIN My_Table T ON T.my_date = CD.my_date + INTERVAL '1 day' AND T.weather = CD.weather ) SELECT * FROM CTE_ConsecutiveDays ORDER BY my_date; 

Here's the SQL Fiddle script: http://www.sqlfiddle.com/#!15/383e5/3

+1
source

Here is another approach based on this answer .

First we add the change column, which is at 1 or 0 , depending on whether the weather is different from the previous day.
Then we introduce the group_nr column, adding up change by a order by date . This gives a unique group number for each sequence of consecutive one-day days, since the amount increases only on the first day of each sequence.
Finally, we do row_number() over (partition by group_nr order by date) to create a counter of operations per group.

 select date, weather, row_number() over (partition by group_nr order by date) from ( select *, sum(change) over (order by date) as group_nr from ( select *, (weather != lag(weather,1,'') over (order by date))::int as change from tmp_weather ) t1 ) t2; 

sqlfiddle (uses equivalent WITH syntax)

+1
source

All Articles