BigQuery SQL for a 28-day aggregate sliding window (without writing 28 SQL rows)

I am trying to calculate a 28-day moving amount in BigQuery using the LAG function.

Top answer to this question

Bigquery SQL for sliding window aggregate

from Felipe Hoff indicates that you can use the LAG function. An example of this might be:

SELECT spend + spend_lagged_1day + spend_lagged_2day + spend_lagged_3day + ... + spend_lagged_27day as spend_28_day_sum, user, date FROM ( SELECT spend, LAG(spend, 1) OVER (PARTITION BY user ORDER BY date) spend_lagged_1day, LAG(spend, 2) OVER (PARTITION BY user ORDER BY date) spend_lagged_2day, LAG(spend, 3) OVER (PARTITION BY user ORDER BY date) spend_lagged_3day, ... LAG(spend, 28) OVER (PARTITION BY user ORDER BY date) spend_lagged_day, user, date FROM user_spend ) 

Is there any way to do this without writing out 28 rows of SQL!

+5
source share
1 answer

The BigQuery documentation does not help describe the complexity of the window functions supported by the tool, since it does not indicate which expressions may appear after ROWS or RANGE. It actually supports the SQL 2003 standard for window functions, which you can find elsewhere on the Internet, such as here .

This means that you can get the desired effect with a single window function. The range is 27, because the number of lines before the current one will be included in the amount.

 SELECT spend, SUM(spend) OVER (PARTITION BY user ORDER BY date ROWS BETWEEN 27 PRECEDING AND CURRENT ROW), user, date FROM user_spend; 

The RANGE border can also be extremely useful. If your table did not have dates for some user, then 27 PRECEDING rows will return for more than 27 days, but RANGE will create a window based on the date values ​​themselves. In the following query, the date field is BigQuery TIMESTAMP, and the range is in microseconds. I would advise that whenever you do the math in this form in BigQuery, you carefully check it to make sure that it gives you the expected answer.

 SELECT spend, SUM(spend) OVER (PARTITION BY user ORDER BY date RANGE BETWEEN 27 * 24 * 60 * 60 * 1000000 PRECEDING AND CURRENT ROW), user, date FROM user_spend; 
+21
source

All Articles