Bigquery SQL for sliding window aggregate

Hi, I have a table that looks like this:

Date Customer Pageviews 2014/03/01 abc 5 2014/03/02 xyz 8 2014/03/03 abc 6 

I want page view aggregates to be grouped by week, but show aggregates in the last 30 days - (sliding window aggregates with window sizes 30 days a week)

I am using google bigquery

EDIT: Gordon - your comment on "Customer". Actually, I need a little more complicated, so I included the client in the table above. I am looking to get the number of customers whose> 30 page views were viewed every week in a 30-day window. something like that

 Date Customers>10 pageviews in 30day window 2014/02/01 10 2014/02/08 5 2014/02/15 6 2014/02/22 15 

However, to make this simple, I will work if I could just get a sliding window of the aggregate pageviews, ignoring the clients as a whole. something like that

 Date count of pageviews in 30day window 2014/02/01 50 2014/02/08 55 2014/02/15 65 2014/02/22 75 
+6
source share
2 answers

How about this:

 SELECT changes + changes1 + changes2 + changes3 changes28days, login, USEC_TO_TIMESTAMP(week) FROM ( SELECT changes, LAG(changes, 1) OVER (PARTITION BY login ORDER BY week) changes1, LAG(changes, 2) OVER (PARTITION BY login ORDER BY week) changes2, LAG(changes, 3) OVER (PARTITION BY login ORDER BY week) changes3, login, week FROM ( SELECT SUM(payload_pull_request_changed_files) changes, UTC_USEC_TO_WEEK(created_at, 1) week, actor_attributes_login login, FROM [publicdata:samples.github_timeline] WHERE payload_pull_request_changed_files > 0 GROUP BY week, login )) HAVING changes28days > 0 

For each user, he counts how many changes they sent per week. Then with LAG () we can look at the next line, how many changes they sent in -1, -2 and -3 weeks. Then we just add these 4 weeks to see how many changes were sent in the last 28 days.

Now you can wrap everything in a new query to filter users with changes> X and read them.

+7
source

I created the following Times table:

 Table Details: Dim_Periods Schema Date TIMESTAMP Year INTEGER Month INTEGER day INTEGER QUARTER INTEGER DAYOFWEEK INTEGER MonthStart TIMESTAMP MonthEnd TIMESTAMP WeekStart TIMESTAMP WeekEnd TIMESTAMP Back30Days TIMESTAMP -- the date 30 days before "Date" Back7Days TIMESTAMP -- the date 7 days before "Date" 

and I use such a request to process "current amounts"

 SELECT Date,Count(*) as MovingCNT FROM (SELECT Date, Back7Days FROM DWH.Dim_Periods where Date < timestamp(current_date()) AND Date >= (DATE_ADD (CURRENT_TIMESTAMP(), -5, 'month')) )P CROSS JOIN EACH (SELECT repository_url,repository_created_at FROM publicdata:samples.github_timeline ) L WHERE timestamp(repository_created_at)>= Back7Days AND timestamp(repository_created_at)<= Date GROUP EACH BY Date 

Please note that it can be used for units "Month to date", "Week by date", "30 days ago", etc. However, performance is not the best, and the query may take some time on large data sets due to Cartesian joining. Hope this helps

+2
source

All Articles