Moving Average at Amazon Redshift

Is there a way to calculate a weighted moving average with a fixed window size in Amazon Redshift? In more detail, given a table with a date column and a column of values, for each date, a weighted average value is calculated from the window of the specified size with the weights indicated in the auxiliary table.

My search attempts so far have provided many examples for this with the help of window functions for simple averages (without weights), for example here . There are also some related suggestions for postgres, for example this SO question , however the Redshift feature set is quite rare compared to postgres and it does not support many of the advanced features that are offered.

+5
source share
1 answer

Assuming we have the following tables:

create temporary table _data (ref_date date, value int); insert into _data values ('2016-01-01', 34) , ('2016-01-02', 12) , ('2016-01-03', 25) , ('2016-01-04', 17) , ('2016-01-05', 22) ; create temporary table _weight (days_in_past int, weight int); insert into _weight values (0, 4) , (1, 2) , (2, 1) ; 

Then, if we want to calculate a moving average over a three-day window (including the current date), where values ​​close to the current date are assigned a higher weight than those that were in the past, we expect the weighted average for 2016-01-05 ( based on values ​​from 2016-01-05 , 2016-01-04 and 2016-01-03 ):

 (22*4 + 17*2 + 25*1) / (4+2+1) = 147 / 7 = 21 

And the query might look like this:

 with _prepare_window as ( select t1.ref_date , datediff(day, t2.ref_date, t1.ref_date) as days_in_past , t2.value * weight as weighted_value , weight , count(t2.ref_date) over(partition by t1.ref_date rows between unbounded preceding and unbounded following) as num_values_in_window from _data t1 left join _data t2 on datediff(day, t2.ref_date, t1.ref_date) between 0 and 2 left join _weight on datediff(day, t2.ref_date, t1.ref_date) = days_in_past order by t1.ref_date , datediff(day, t2.ref_date, t1.ref_date) ) select ref_date , round(sum(weighted_value)::float/sum(weight), 0) as weighted_average from _prepare_window where num_values_in_window = 3 group by ref_date order by ref_date ; 

Providing the result:

  ref_date | weighted_average ------------+------------------ 2016-01-03 | 23 2016-01-04 | 19 2016-01-05 | 21 (3 rows) 
0
source

Source: https://habr.com/ru/post/1211923/


All Articles