If you read the book “Developing Time-Oriented Applications in SQL”, RT Snodgrass (the pdf file of which can be found in its website under publications), and reaching Figure 6.25 on pages 165-166, you will find non-trivial SQL that can be used in the current example to group different lines with the same identifier value and continuous time intervals.
The version of the query below is close to correct, but at the end there is a problem that has its source in the first SELECT statement. I still do not understand the reason for the wrong answer. [If someone can test SQL in their DBMS and tell me if the first query works correctly there, that will be a big help!]
It looks something like this:
-- Derived from Figure 6.25 from Snodgrass "Developing Time-Oriented -- Database Applications in SQL" CREATE TABLE Data ( Start DATE, Finish DATE, ID CHAR(2), Amount INT ); INSERT INTO Data VALUES('2008-10-01', '2008-10-02', '01', 10); INSERT INTO Data VALUES('2008-10-02', '2008-10-03', '02', 20); INSERT INTO Data VALUES('2008-10-03', '2008-10-04', '01', 38); INSERT INTO Data VALUES('2008-10-04', '2008-10-05', '01', 23); INSERT INTO Data VALUES('2008-10-05', '2008-10-06', '03', 14); INSERT INTO Data VALUES('2008-10-06', '2008-10-07', '02', 3); INSERT INTO Data VALUES('2008-10-07', '2008-10-08', '02', 8); INSERT INTO Data VALUES('2008-10-08', '2008-11-08', '03', 19); SELECT DISTINCT F.ID, F.Start, L.Finish FROM Data AS F, Data AS L WHERE F.Start < L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish < M.Start AND M.Start < L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start < M.Start AND M.Start <= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)));
The result of this query:
01 2008-10-01 2008-10-02 01 2008-10-03 2008-10-05 02 2008-10-02 2008-10-03 02 2008-10-06 2008-10-08 03 2008-10-05 2008-10-06 03 2008-10-05 2008-11-08 03 2008-10-08 2008-11-08
Edited . The problem with the penultimate line - it should not be. And I don’t know (yet) where it comes from.
Now we need to treat this complex expression as a query expression in the FROM clause of another SELECT statement, which sums the quantity values for a given identifier over elements that overlap with the maximum ranges shown above.
SELECT M.ID, M.Start, M.Finish, SUM(D.Amount) FROM Data AS D, (SELECT DISTINCT F.ID, F.Start, L.Finish FROM Data AS F, Data AS L WHERE F.Start < L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish < M.Start AND M.Start < L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start < M.Start AND M.Start <= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)))) AS M WHERE D.ID = M.ID AND M.Start <= D.Start AND M.Finish >= D.Finish GROUP BY M.ID, M.Start, M.Finish ORDER BY M.ID, M.Start;
This gives:
ID Start Finish Amount 01 2008-10-01 2008-10-02 10 01 2008-10-03 2008-10-05 61 02 2008-10-02 2008-10-03 20 02 2008-10-06 2008-10-08 11 03 2008-10-05 2008-10-06 14 03 2008-10-05 2008-11-08 33
Edited . This is an almost correct dataset on which to aggregate the COUNT and SUM requested by the original question, so the final answer is:
SELECT I.ID, COUNT(*) AS Number, SUM(I.Amount) AS Amount FROM (SELECT M.ID, M.Start, M.Finish, SUM(D.Amount) AS Amount FROM Data AS D, (SELECT DISTINCT F.ID, F.Start, L.Finish FROM Data AS F, Data AS L WHERE F.Start < L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish < M.Start AND M.Start < L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start < M.Start AND M.Start <= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR (T2.Start <= L.Finish AND L.Finish < T2.Finish))) ) AS M WHERE D.ID = M.ID AND M.Start <= D.Start AND M.Finish >= D.Finish GROUP BY M.ID, M.Start, M.Finish ) AS I GROUP BY I.ID ORDER BY I.ID; id number amount 01 2 71 02 2 31 03 3 66
Review : Ouch! Drat ... the entry for 3 has twice as much the "amount" that it should have. The previous “edited” parts indicate where everything went wrong. It seems that either the first request is subtly erroneous (maybe it is for a different question), or the optimizer I'm working with is incorrect. However, there must be an answer closely related to this that will give the correct meaning.
For the record: tested on IBM Informix Dynamic Server 11.50 on Solaris 10. However, it should work fine on any other mid-sized SQL DBMS.