Define continuous time intervals

Question

Define continuous time intervals

I have the following table structure:

id int -- more like a group id, not unique in the table
AddedOn datetime -- when the record was added

For a particular, idthere is no more than one entry per day. I have to write a query that returns adjacent (at the day level) date intervals for each id. Expected Result Structure:

id int
StartDate datetime
EndDate datetime

Please note that the temporary part AddedOnis available, but this is not important here.

To make this clearer, here are some input:

with data as 
(
  select * from
  (
    values
    (0, getdate()), --dummy record used to infer column types

    (1, '20150101'),
    (1, '20150102'),
    (1, '20150104'),
    (1, '20150105'),
    (1, '20150106'),

    (2, '20150101'),
    (2, '20150102'),
    (2, '20150103'),
    (2, '20150104'),
    (2, '20150106'),
    (2, '20150107'),

    (3, '20150101'),
    (3, '20150103'),
    (3, '20150105'),
    (3, '20150106'),
    (3, '20150108'),
    (3, '20150109'),
    (3, '20150110')
  ) as d(id, AddedOn)
  where id > 0 -- exclude dummy record
)
select * from data

And the expected result:

id      StartDate      EndDate
1       2015-01-01     2015-01-02
1       2015-01-04     2015-01-06

2       2015-01-01     2015-01-04
2       2015-01-06     2015-01-07

3       2015-01-01     2015-01-01
3       2015-01-03     2015-01-03
3       2015-01-05     2015-01-06
3       2015-01-08     2015-01-10

Although this seems like a common problem, I could not find a fairly similar question. I am also approaching the solution, and I will publish it when (and if) it works, but I feel that it should be more elegant.

+4

sql sql-server tsql sql-server-2008

B0Andrew May 18, '15 at 12:33

source share

5 answers

Sql Server 2008 LEAD LAG:

WITH    data
          AS ( SELECT   * ,
                        ROW_NUMBER() OVER ( ORDER BY id, AddedOn ) AS rn
               FROM     ( VALUES ( 0, GETDATE()), --dummy record used to infer column types
                        ( 1, '20150101'), ( 1, '20150102'), ( 1, '20150104'),
                        ( 1, '20150105'), ( 1, '20150106'), ( 2, '20150101'),
                        ( 2, '20150102'), ( 2, '20150103'), ( 2, '20150104'),
                        ( 2, '20150106'), ( 2, '20150107'), ( 3, '20150101'),
                        ( 3, '20150103'), ( 3, '20150105'), ( 3, '20150106'),
                        ( 3, '20150108'), ( 3, '20150109'), ( 3, '20150110') )
                        AS d ( id, AddedOn )
               WHERE    id > 0 -- exclude dummy record
             ),
        diff
          AS ( SELECT   d1.* ,
                        CASE WHEN ISNULL(DATEDIFF(dd, d2.AddedOn, d1.AddedOn),
                                         1) = 1 THEN 0
                             ELSE 1
                        END AS diff
               FROM     data d1
                        LEFT JOIN data d2 ON d1.id = d2.id
                                             AND d1.rn = d2.rn + 1
             ),
        parts
          AS ( SELECT   * ,
                        ( SELECT    SUM(diff)
                          FROM      diff d2
                          WHERE     d2.rn <= d1.rn
                        ) AS p
               FROM     diff d1
             )
    SELECT  id ,
            MIN(AddedOn) AS StartDate ,
            MAX(AddedOn) AS EndDate
    FROM    parts
    GROUP BY id ,
            p

:

id  StartDate               EndDate
1   2015-01-01 00:00:00.000 2015-01-02 00:00:00.000
1   2015-01-04 00:00:00.000 2015-01-06 00:00:00.000
2   2015-01-01 00:00:00.000 2015-01-04 00:00:00.000
2   2015-01-06 00:00:00.000 2015-01-07 00:00:00.000
3   2015-01-01 00:00:00.000 2015-01-01 00:00:00.000
3   2015-01-03 00:00:00.000 2015-01-03 00:00:00.000
3   2015-01-05 00:00:00.000 2015-01-06 00:00:00.000
3   2015-01-08 00:00:00.000 2015-01-10 00:00:00.000

:

CTE :

1   2015-01-01 00:00:00.000 1   0
1   2015-01-02 00:00:00.000 2   0
1   2015-01-04 00:00:00.000 3   1
1   2015-01-05 00:00:00.000 4   0
1   2015-01-06 00:00:00.000 5   0

, . , 1 , 0 else, 1.

CTE ( ), :

1   2015-01-01 00:00:00.000 1   0   0
1   2015-01-02 00:00:00.000 2   0   0
1   2015-01-04 00:00:00.000 3   1   1
1   2015-01-05 00:00:00.000 4   0   1
1   2015-01-06 00:00:00.000 5   0   1
2   2015-01-01 00:00:00.000 6   0   1
2   2015-01-02 00:00:00.000 7   0   1
2   2015-01-03 00:00:00.000 8   0   1
2   2015-01-04 00:00:00.000 9   0   1
2   2015-01-06 00:00:00.000 10  1   2
2   2015-01-07 00:00:00.000 11  0   2
3   2015-01-01 00:00:00.000 12  0   2
3   2015-01-03 00:00:00.000 13  1   3

- ID new column min max .

+3

Giorgi Nakeuri 18 '15 13:16

" №3 SQL MVP Deep Dives" https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/ :

with 
data as 
(
    select * from
    (
    values
    (0, getdate()), --dummy record used to infer column types

    (1, '20150101'),
    (1, '20150102'),
    (1, '20150104'),
    (1, '20150105'),
    (1, '20150106'),

    (2, '20150101'),
    (2, '20150102'),
    (2, '20150103'),
    (2, '20150104'),
    (2, '20150106'),
    (2, '20150107'),

    (3, '20150101'),
    (3, '20150103'),
    (3, '20150105'),
    (3, '20150106'),
    (3, '20150108'),
    (3, '20150109'),
    (3, '20150110')
    ) as d(id, AddedOn)
    where id > 0 -- exclude dummy record
)
,CTE_Seq
AS
(
    SELECT
        ID
        ,SeqNo
        ,SeqNo - ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SeqNo) AS rn
    FROM
        data
        CROSS APPLY
        (
            SELECT DATEDIFF(day, '20150101', AddedOn) AS SeqNo
        ) AS CA
)
SELECT
    ID
    ,DATEADD(day, MIN(SeqNo), '20150101') AS StartDate
    ,DATEADD(day, MAX(SeqNo), '20150101') AS EndDate
FROM CTE_Seq
GROUP BY ID, rn
ORDER BY ID, StartDate;

ID  StartDate               EndDate
1   2015-01-01 00:00:00.000 2015-01-02 00:00:00.000
1   2015-01-04 00:00:00.000 2015-01-06 00:00:00.000
2   2015-01-01 00:00:00.000 2015-01-04 00:00:00.000
2   2015-01-06 00:00:00.000 2015-01-07 00:00:00.000
3   2015-01-01 00:00:00.000 2015-01-01 00:00:00.000
3   2015-01-03 00:00:00.000 2015-01-03 00:00:00.000
3   2015-01-05 00:00:00.000 2015-01-06 00:00:00.000
3   2015-01-08 00:00:00.000 2015-01-10 00:00:00.000

CTE_Seq, , .

select * from CTE_Seq

SELECT ... GROUP BY .... :

ID  SeqNo   rn
1   0   -1
1   1   -1
1   3   0
1   4   0
1   5   0
2   0   -1
2   1   -1
2   2   -1
2   3   -1
2   5   0
2   6   0
3   0   -1
3   2   0
3   4   1
3   5   1
3   7   2
3   8   2
3   9   2

DATEDIFF(day, '20150101', AddedOn). ROW_NUMBER() , , , /. , SELECT GROUP BY ID, rn .

+2

Vladimir Baranov 18 '15 13:18

, . , , , . , .

with
Data( ID, AddedOn )as(
  select 1, convert( date, '20150101' ) union all
  select 1, '20150102' union all
  select 1, '20150104' union all
  select 1, '20150105' union all
  select 1, '20150106' union all
  select 2, '20150101' union all
  select 2, '20150102' union all
  select 2, '20150103' union all
  select 2, '20150104' union all
  select 2, '20150106' union all
  select 2, '20150107' union all
  select 3, '20150101' union all
  select 3, '20150103' union all
  select 3, '20150105' union all
  select 3, '20150106' union all
  select 3, '20150108' union all
  select 3, '20150109' union all
  select 3, '20150110'
)
select  d.ID, d.AddedOn StartDate, IsNull( d1.AddedOn, '99991231' ) EndDate
from    Data    d
left join Data  d1
    on  d1.ID   = d.ID
    and d1.AddedOn  =(
        select  Min( AddedOn )
        from    data
        where   ID  = d.ID
        and AddedOn > d.AddedOn );

In your situation, I assume that ID and AddedOn form a composite PK and therefore are indexed. Thus, the query will work impressively fast even on very large tables.

Also, I used an external join because it seemed like the last AddOn date of each ID should be visible in the StartDate column. Instead of NULL, I used the general value of MaxDate. NULL can work just like the "this is the most recent StartDate flag" flag.

Here is the result for ID = 1:

ID          StartDate  EndDate
----------- ---------- ----------
1           2015-01-01 2015-01-02
1           2015-01-02 2015-01-04
1           2015-01-04 2015-01-05
1           2015-01-05 2015-01-06
1           2015-01-06 9999-12-31

+2

Tommcat May 20, '15 at 1:09

source share

I would like to post my own solution, because this is another approach:

with data as 
(
  ...
),
temp as 
(
  select     d.id 
            ,d.AddedOn
            ,dprev.AddedOn as PrevAddedOn
            ,dnext.AddedOn as NextAddedOn
  FROM      data d
            left JOIN
            data dprev on   dprev.id = d.id
                       and  dprev.AddedOn = dateadd(d, -1, d.AddedOn)
            left JOIN
            data dnext on   dnext.id = d.id
                       and  dnext.AddedOn = dateadd(d,  1, d.AddedOn)
),
starts AS
(
  select     id
            ,AddedOn 
  from      temp 
  where     PrevAddedOn is NULL
),
ends as
(
  select     id
            ,AddedOn
  from      temp
  where     NextAddedon is NULL
)
SELECT   s.id as id
        ,s.AddedOn as StartDate
        ,(select min(e.AddedOn) from ends e where e.id = s.id and e.AddedOn >= s.AddedOn) as EndDate
from    starts s

+1

B0Andrew May 18, '15 at 15:03

source share

Stephan · Accepted Answer · 2015-05-18T13:21:30+0000

- , by row_number, , .

WITH CTE_dayOfYear
AS
(
    SELECT  id,
            AddedOn,
            DATEDIFF(DAY,'20000101',AddedOn) dyID,
            ROW_NUMBER() OVER (ORDER BY ID,AddedOn) row_num
    FROM data
)

SELECT  ID,
        MIN(AddedOn) StartDate,
        MAX(AddedOn) EndDate,
        dyID-row_num AS groupID
FROM CTE_dayOfYear
GROUP BY ID,dyID - row_num
ORDER BY ID,2,3

, dyID , , row_num . , dyID , row_num dyID. .

Define continuous time intervals

More articles: