SQL: difference between SUM calculated over a period

Question

SQL: difference between SUM calculated over a period

I have a table that looks like this:

CREATE TABLE foobar ( id SERIAL PRIMARY KEY, data_entry_date DATE NOT NULL, user_id INTEGER NOT NULL, wine_glasses_drunk INTEGER NOT NULL, whisky_shots_drunk INTEGER NOT NULL, beer_bottle_drunk INTEGER NOT NULL ); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-01', 1, 1,0,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-02', 1, 4,0,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-03', 1, 0,0,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-04', 1, 1,0,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-05', 1, 2,1,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-07', 1, 1,2,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-08', 1, 4,0,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-11', 1, 1,1,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-12', 1, 1,0,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-13', 1, 2,0,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-14', 1, 1,0,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-15', 1, 9,3,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-16', 1, 0,4,2); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-17', 1, 0,5,3); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-18', 1, 2,2,5); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-20', 1, 1,1,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-23', 1, 1,3,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-01-24', 1, 0,0,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-01', 1, 1,1,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-02', 1, 2,3,4); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-05', 1, 1,2,2); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-09', 1, 0,0,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-10', 1, 1,1,1); insert into foobar (data_entry_date, user_id, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk) VALUES ('2011-02-11', 1, 3,6,3);

I want to write a query that shows me the difference in TOTAL wine_glasses_drunk, TOTAL whiskey_shots_drunk and TOTAL beer_bottles_drunk for a certain period, compared to TOTAL for the previous period.

It probably sounds more complicated than it is. If we use a period of 1 week == 7 days, then the request should return the difference in the amounts consumed for this week , compared with the amounts consumed last week .

A slight complication is that the dates in the table are not continuous, i.e. there are some missing dates, so in the request you need to find the most relevant date when determining the dates for calculating the period.

 This is what I have so far: -- using hard coded dates SELECT (SUM(f1.wine_glasses_drunk) - SUM(f2.wine_glasses_drunk)) as wine_diff, (SUM(f1.whisky_shots_drunk) - SUM(f2.whisky_shots_drunk)) as whisky_diff, (SUM(f1.beer_bottle_drunk) - SUM(f2.beer_bottle_drunk)) as beer_diff FROM foobar f1 INNER JOIN foobar f2 ON f2.user_id=f1.user_id WHERE f1.user_id=1 AND f1.data_entry_date BETWEEN '2011-01-08' AND '2011-01-15' AND f2.data_entry_date BETWEEN '2011-01-01' AND '2011-01-08' AND f1.data_entry_date - f2.data_entry_date between 6 and 9;

The above SQL is obviously a hack (especially the criteria f1.data_entry_date - f2.data_entry_date between 6 and 9 ). I checked the results in excel and the query results were erroneous (error free).

How can I write this query - and how can I change it so that it can deal with non-contiguous dates in the database?

I use postgreSQl, but prefer, if possible, database agnostic (i.e. ANSI) SQL.

+4

sql postgresql

Homunculus reticulli Jun 11 '12 at 19:20

source share

4 answers

I am not quite sure of the description that you indicated if I would do it right, but I would use two different functions to get the desired result.

First, look at the date_trunc function. This can get the date of the first day of the week, and you can group it to get the amount for the week. If the first day of the week is not what you want, you can use date arithmetic to sort it. I think this first day of the week is Monday.

Secondly, you can use the lag window function to find the sum for the previous line. Please note that if you are missing a week, this function will look at the previous line, and not at the previous week. I put a check in the query to make sure the database looks in the right row.

 select user_id, week_start_date, this_week_wine_glasses_drunk - case when is_consecutive_weeks = 'TRUE' then last_week_wine_glasses_drunk else 0 end as wine_glasses_drunk, this_week_whisky_shots_drunk - case when is_consecutive_weeks = 'TRUE' then last_week_whisky_shots_drunk else 0 end as whisky_shots_drunk, this_week_beer_bottle_drunk - case when is_consecutive_weeks = 'TRUE' then last_week_beer_bottle_drunk else 0 end as beer_bottle_drunk from ( select user_id, week_start_date, this_week_wine_glasses_drunk, this_week_whisky_shots_drunk, this_week_beer_bottle_drunk, case when (lag(week_start_date) over (partition by user_id order by week_start_date) + interval '7' day) = week_start_date then 'TRUE' end as is_consecutive_weeks, lag(this_week_wine_glasses_drunk) over (partition by user_id order by week_start_date) as last_week_wine_glasses_drunk, lag(this_week_whisky_shots_drunk) over (partition by user_id order by week_start_date) as last_week_whisky_shots_drunk, lag(this_week_beer_bottle_drunk) over (partition by user_id order by week_start_date) as last_week_beer_bottle_drunk from ( select user_id, date_trunc('week', data_entry_date) as week_start_date, sum(wine_glasses_drunk) as this_week_wine_glasses_drunk, sum(whisky_shots_drunk) as this_week_whisky_shots_drunk, sum(beer_bottle_drunk) as this_week_beer_bottle_drunk from foobar group by user_id, date_trunc('week', data_entry_date) ) a ) b

A SQL script is available so you can take a look.

By the way, I am from Oracle background and hacked it using PostgreSQL and SQL Fiddle documentation. Hope this is what you need.

+2

Mike meyers Jun 14 '12 at 17:39

source share

A slightly different approach (I will let you fill in the date parameters.):

 Declare @StartDate1, @EndDate1, @StartDate2, @EndDate2 AS Date Set @StartDate1='6/1/2012' Set @EndDate1='6/15/2012' Set @StartDate2='6/16/2012' Set @EndDate2='6/30/2012' SELECT SUM(U.WineP1)-SUM(U.WineP2) AS WineDiff, SUM(U.WhiskeyP1)-SUM(U.WhiskeyP2) AS WhiskeyDiff, SUM(U.BeerP1)-SUM(U.BeerP2) AS BeerDiff FROM ( SELECT SUM(wine_glasses_drunk) AS WineP1, SUM(whisky_shots_drunk) AS WhiskeyP1, SUM(beer_bottle_drunk) AS BeerP1, 0 AS WineP2, 0 AS WhiskeyP2, 0 AS BeerP2 FROM foobar WHERE data_entry_date BETWEEN @StartDate1 AND @EndDate1 UNION ALL SELECT 0 AS WineP1, 0 AS WhiskeyP1, 0 AS BeerP1, SUM(wine_glasses_drunk) AS WineP2, SUM(whisky_shots_drunk) AS WhiskeyP2, SUM(beer_bottle_drunk) AS BeerP2 FROM foobar WHERE data_entry_date BETWEEN @StartDate2 AND @EndDate2 ) AS U

+1

Holger brandt Jun 14 '12 at 16:54

source share

Typically, when developing these queries, create it in peices, and then combine them. first find a good structure, and then build all the parks you need so that you can understand how each peice works on its own.

Here, I think you will need to use more subqueries to find a clear way to do this. I think you could try something in this direction:

Calculate date ranges and hold them as variables. (You can add days to a date to find the next period, rather than the code you specified above.)

 Declare @SQL1, @SQL2, @SQL3 as Date Set @SQL1=(SQL1) ...

Then find the weekly totals to use dates as parameters.

 Select sum(wine_glasses_drunk) as wine_totals, sum(whiskey_shots_drunk) as whiskey_totals, sum(beer_bottle_drunk) as beer_totals, case when data_entry_date between @SQL1 and @SQL2 then 1 when data_entry_date between @SQL2 and @SQL3 then 2 end as period_number from foobar

Then create the summary query that you need, because the data is in a format that simplifies it, and you do not need to use as many sums of the same values several times.

0

David manheim Jun 12 '12 at 16:28

source share

Mike meyers · Accepted Answer · 2012-06-14T18:46:48+0000

I was going to add this as a change to my other answer, but this is really a different way to do this, so there should be a separate answer.

I think I prefer the other answer that I gave, but this should work, even if there are data gaps.

To set the parameters for the query, change the values of period_start_date and period_days to query_params in the proposal section.

 with query_params as ( select date '2011-01-01' as period_start_date, 7 as period_days ), summary_data as ( select user_id, (data_entry_date - period_start_date)/period_days as period_number, sum(wine_glasses_drunk) as wine_glasses_drunk, sum(whisky_shots_drunk) as whisky_shots_drunk, sum(beer_bottle_drunk) as beer_bottle_drunk from foobar cross join query_params group by user_id, (data_entry_date - period_start_date)/period_days ) select user_id, period_number, period_start_date + period_number * period_days as period_start_date, sum(wine_glasses_drunk) as wine_glasses_drunk, sum(whisky_shots_drunk) as whisky_shots_drunk, sum(beer_bottle_drunk) as beer_bottle_drunk from ( -- this weeks data select user_id, period_number, wine_glasses_drunk, whisky_shots_drunk, beer_bottle_drunk from summary_data union all -- last weeks data select user_id, period_number + 1 as period_number, -wine_glasses_drunk as wine_glasses_drunk, -whisky_shots_drunk as whisky_shots_drunk, -beer_bottle_drunk as beer_bottle_drunk from summary_data ) a cross join query_params where period_number <= (select max(period_number) from summary_data) group by user_id, period_number, period_start_date + period_number * period_days order by 1, 2

And again, SQL Fiddle is available.

SQL: difference between SUM calculated over a period

More articles: