Merge intervals between overlapping dates

Is there a better way to merge the intervals between overlapping dates?
The solution I came up with is so simple that now I wonder if anyone else has a better idea of ​​how this can be done.

/***** DATA EXAMPLE *****/ DECLARE @T TABLE (d1 DATETIME, d2 DATETIME) INSERT INTO @T (d1, d2) SELECT '2010-01-01','2010-03-31' UNION SELECT '2010-04-01','2010-05-31' UNION SELECT '2010-06-15','2010-06-25' UNION SELECT '2010-06-26','2010-07-10' UNION SELECT '2010-08-01','2010-08-05' UNION SELECT '2010-08-01','2010-08-09' UNION SELECT '2010-08-02','2010-08-07' UNION SELECT '2010-08-08','2010-08-08' UNION SELECT '2010-08-09','2010-08-12' UNION SELECT '2010-07-04','2010-08-16' UNION SELECT '2010-11-01','2010-12-31' UNION SELECT '2010-03-01','2010-06-13' /***** INTERVAL ANALYSIS *****/ WHILE (1=1) BEGIN UPDATE t1 SET t1.d2 = t2.d2 FROM @T AS t1 INNER JOIN @T AS t2 ON DATEADD(day, 1, t1.d2) BETWEEN t2.d1 AND t2.d2 IF @@ROWCOUNT = 0 BREAK END /***** RESULT *****/ SELECT StartDate = MIN(d1) , EndDate = d2 FROM @T GROUP BY d2 ORDER BY StartDate, EndDate /***** OUTPUT *****/ /***** StartDate EndDate 2010-01-01 2010-06-13 2010-06-15 2010-08-16 2010-11-01 2010-12-31 *****/ 
+14
sql sql-server tsql
source share
7 answers

I searched for the same solution and stumbled upon this post in a combined date / time combination to return a single record of an overlapping range .

Packing Date Intervals has another topic.

I checked this with various date ranges, including those listed here, and it works correctly every time.


 SELECT s1.StartDate, --t1.EndDate MIN(t1.EndDate) AS EndDate FROM @T s1 INNER JOIN @T t1 ON s1.StartDate <= t1.EndDate AND NOT EXISTS(SELECT * FROM @T t2 WHERE t1.EndDate >= t2.StartDate AND t1.EndDate < t2.EndDate) WHERE NOT EXISTS(SELECT * FROM @T s2 WHERE s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate) GROUP BY s1.StartDate ORDER BY s1.StartDate 

Result:

 StartDate | EndDate 2010-01-01 | 2010-06-13 2010-06-15 | 2010-06-25 2010-06-26 | 2010-08-16 2010-11-01 | 2010-12-31 
+19
source share

You asked about this back in 2010, but did not specify any specific version.

The answer for people on SQL Server 2012+

 WITH T1 AS (SELECT *, MAX(d2) OVER (ORDER BY d1) AS max_d2_so_far FROM @T), T2 AS (SELECT *, CASE WHEN d1 <= DATEADD(DAY, 1, LAG(max_d2_so_far) OVER (ORDER BY d1)) THEN 0 ELSE 1 END AS range_start FROM T1), T3 AS (SELECT *, SUM(range_start) OVER (ORDER BY d1) AS range_group FROM T2) SELECT range_group, MIN(d1) AS d1, MAX(d2) AS d2 FROM T3 GROUP BY range_group 

Which is returning

 +-------------+------------+------------+ | range_group | d1 | d2 | +-------------+------------+------------+ | 1 | 2010-01-01 | 2010-06-13 | | 2 | 2010-06-15 | 2010-08-16 | | 3 | 2010-11-01 | 2010-12-31 | +-------------+------------+------------+ 

DATEADD(DAY, 1 used because your desired results indicate that you want the period ending in 2010-06-25 be collapsed into one starting from 2010-06-26 . In other cases of use, this may require adjustment.

+6
source share

Here is the solution with just three simple scans. No CTE, no recursion, no joins, no table updates in the loop, no "grouping by" - as a result, this solution should scale better (I think). I think that the number of scans can be reduced to two if the minimum and maximum dates are known in advance; Logic itself needs only two scans - find the gaps applied twice.

 declare @datefrom datetime, @datethru datetime DECLARE @T TABLE (d1 DATETIME, d2 DATETIME) INSERT INTO @T (d1, d2) SELECT '2010-01-01','2010-03-31' UNION SELECT '2010-03-01','2010-06-13' UNION SELECT '2010-04-01','2010-05-31' UNION SELECT '2010-06-15','2010-06-25' UNION SELECT '2010-06-26','2010-07-10' UNION SELECT '2010-08-01','2010-08-05' UNION SELECT '2010-08-01','2010-08-09' UNION SELECT '2010-08-02','2010-08-07' UNION SELECT '2010-08-08','2010-08-08' UNION SELECT '2010-08-09','2010-08-12' UNION SELECT '2010-07-04','2010-08-16' UNION SELECT '2010-11-01','2010-12-31' select @datefrom = min(d1) - 1, @datethru = max(d2) + 1 from @t SELECT StartDate, EndDate FROM ( SELECT MAX(EndDate) OVER (ORDER BY StartDate) + 1 StartDate, LEAD(StartDate ) OVER (ORDER BY StartDate) - 1 EndDate FROM ( SELECT StartDate, EndDate FROM ( SELECT MAX(EndDate) OVER (ORDER BY StartDate) + 1 StartDate, LEAD(StartDate) OVER (ORDER BY StartDate) - 1 EndDate FROM ( SELECT d1 StartDate, d2 EndDate from @T UNION ALL SELECT @datefrom StartDate, @datefrom EndDate UNION ALL SELECT @datethru StartDate, @datethru EndDate ) T ) T WHERE StartDate <= EndDate UNION ALL SELECT @datefrom StartDate, @datefrom EndDate UNION ALL SELECT @datethru StartDate, @datethru EndDate ) T ) T WHERE StartDate <= EndDate 

Result:

 StartDate EndDate 2010-01-01 2010-06-13 2010-06-15 2010-08-16 2010-11-01 2010-12-31 
+1
source share

In this solution, I created a temporary calendar table that stores the value for each day in a range. This type of table can be made static. In addition, I keep 400 odd dates starting from 2009-12-31. Obviously, if your dates cover a wider range, you will need more values.

In addition, this solution will only work with SQL Server 2005+, since I am using CTE.

 With Calendar As ( Select DateAdd(d, ROW_NUMBER() OVER ( ORDER BY s1.object_id ), '1900-01-01') As [Date] From sys.columns as s1 Cross Join sys.columns as s2 ) , StopDates As ( Select C.[Date] From Calendar As C Left Join @T As T On C.[Date] Between T.d1 And T.d2 Where C.[Date] >= ( Select Min(T2.d1) From @T As T2 ) And C.[Date] <= ( Select Max(T2.d2) From @T As T2 ) And T.d1 Is Null ) , StopDatesInUse As ( Select D1.[Date] From StopDates As D1 Left Join StopDates As D2 On D1.[Date] = DateAdd(d,1,D2.Date) Where D2.[Date] Is Null ) , DataWithEariestStopDate As ( Select * , (Select Min(SD2.[Date]) From StopDatesInUse As SD2 Where T.d2 < SD2.[Date] ) As StopDate From @T As T ) Select Min(d1), Max(d2) From DataWithEariestStopDate Group By StopDate Order By Min(d1) 

EDIT The problem of using dates in 2009 has nothing to do with the final request. The problem is that the Calendar table is not large enough. I started the calendar table in 2009-12-31. I revised it in 1900-01-01.

0
source share

try it

 ;WITH T1 AS ( SELECT d1, d2, ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS R FROM @T ), NUMS AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS R FROM T1 A CROSS JOIN T1 B CROSS JOIN T1 C ), ONERANGE AS ( SELECT DISTINCT DATEADD(DAY, ROW_NUMBER() OVER(PARTITION BY T1.R ORDER BY (SELECT 0)) - 1, T1.D1) AS ELEMENT FROM T1 CROSS JOIN NUMS WHERE NUMS.R <= DATEDIFF(DAY, d1, d2) + 1 ), SEQUENCE AS ( SELECT ELEMENT, DATEDIFF(DAY, '19000101', ELEMENT) - ROW_NUMBER() OVER(ORDER BY ELEMENT) AS rownum FROM ONERANGE ) SELECT MIN(ELEMENT) AS StartDate, MAX(ELEMENT) as EndDate FROM SEQUENCE GROUP BY rownum 

The basic idea is to expand existing data first, so you get a separate row for each day. This is done in ONERANGE

Then determine the relationship between how the number of dates increases and how line numbers behave. The difference remains constant within the existing range / island. As soon as you get to a new island of data, the difference between them increases, because the date increases by more than 1, and the number of rows increases by 1.

0
source share

The idea is to model the algorithm for scanning merge intervals. My solution ensures that it works across a wide range of SQL implementations. I tested it on MySQL, Postgres, SQL-Server 2017, SQLite and even Hive.

It is assumed that the table layout is as follows.

 CREATE TABLE t ( a DATETIME, b DATETIME ); 

We also assume that the interval is half open, as [a, b).

When (a, i, j) is in the table, it shows that there are j intervals covering a, and I are intervals covering the previous point.

 CREATE VIEW r AS SELECT a, Sum(d) OVER (ORDER BY a ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS i, Sum(d) OVER (ORDER BY a ROWS UNBOUNDED PRECEDING) AS j FROM (SELECT a, Sum(d) AS d FROM (SELECT a, 1 AS d FROM t UNION ALL SELECT b, -1 AS d FROM t) e GROUP BY a) f; 

We produce all the endpoints in the union of the intervals and connect the neighboring ones. Finally, we produce a set of intervals, selecting only odd lines.

 SELECT a, b FROM (SELECT a, Lead(a) OVER (ORDER BY a) AS b, Row_number() OVER (ORDER BY a) AS n FROM r WHERE j=0 OR i=0 OR i is null) e WHERE n%2 = 1; 

I created a sample DB-Fiddle and SQL-Fiddle . I also wrote a blog post about join intervals in SQL .

0
source share

This is my question. How to collapse overlapping time intervals - the end can be zero, but the beginning is never zero

The code below does not work with my data. SELECT s1.StartDate, --t1.EndDate MIN (t1.EndDate) AS EndDate FROM @T s1 INTERNAL CONNECTION @T t1 ON s1.StartDate <= t1.EndDate AND DO NOT EXIST (SELECT * FROM @T t2, WHERE t1. EndDate> = t2.StartDate AND t1.EndDate <t2.EndDate) WHERE DO NOT EXIST (SELECT * FROM @T s2, WHERE s1.StartDate> s2.StartDate AND s1.StartDate <= s2.EndDate) GROUP BY s1.StartDate ORDER BY s1.StartDate

0
source share

All Articles