Find the longest sequence of values in a table

Question

Find the longest sequence of values in a table

This is a SQL question, I think it is difficult - I am not sure if this can be achieved in a simple SQL statement or in a stored procedure:

I want to find the number of the longest sequence of the same (known) number in a table column:

Example:

TABLE: DATE SALEDITEMS 1/1/09 4 1/2/09 3 1/3/09 3 1/4/09 4 1/5/09 3

calling sp / sentences for 4 will give 1 calling sp / sentecne for 3 will give 2 as it was 2 times in a row on line 3.

I am running SQL Server 2008.

+4

sql sql-server tsql sql-server-2008 stored-procedures

Dani Nov 01 '09 at 10:17

source share

2 answers

Unconfirmed as you did not provide DDL and sample data:

 DECLARE @SALEDITEMS INT; SET @SALEDITEMS=3; SELECT MAX(cnt) FROM( SELECT COUNT(*) FROM YourTable JOIN ( SELECT y1.[Date] AS d1, y2.[Date] AS d2 FROM YourTable AS y1 JOIN YourTable AS y2 ON y1.SALEDITEMS=@SALEDITEMS AND y2.SALEDITEMS=@SALEDITEMS AND NOT EXISTS(SELECT 1 FROM YourTable AS y WHERE y.SALEDITEMS<>@SALEDITEMS AND y1.[Date] < y.[Date] AND y.[Date] < y2.[Date]) ) AS t WHERE [Date] BETWEEN t.d1 AND t.d2 ) AS t;

0

AK Nov 01 '09 at 10:38

source share

Justin grant · Accepted Answer · 2009-11-01T23:36:36+0000

UPDATE: I generated a million rows of random data and refused a recursive CTE solution, as its query plan did not use indexes in the optimizer very well.

But the non-recursive solution that I wrote in the original turned out to be great if an additional non-clustered index was added to it (SALEDITEMS, [DATE]). This makes sense, as the request should be filtered in both directions (both by date and by SALEDITEMS). With this additional index, requests per million rows are returned in less than 2 seconds to my (not very fleshy) desktop. Without this index, the query was slow.

By the way, this is a great example of how SQL Server cost-based query optimization is completely broken in some cases. The CTE recursive solution has a cost (on my PC) of 42 and takes at least a few minutes to complete. A non-recursive solution has a cost of 15,446 (!!!) and ends in 1.5 seconds. Moral of the story: When comparing SQL Server query plans, do not assume that costs necessarily correlate with query performance!

In any case, here is the solution I would recommend (the same non-recursive CTE that I posted earlier):

 DECLARE @SALEDITEMS INT = 3; WITH SalesNoMatch ([DATE], SALEDITEMS, NoMatchDate) AS ( SELECT [DATE], SALEDITEMS, (SELECT MIN([DATE]) FROM Sales s2 WHERE s2.SALEDITEMS <> @SALEDITEMS AND s2.[DATE] > s1.[DATE]) as NoMatchDate FROM Sales s1 ) , SalesMatchCount ([DATE], ConsecutiveCount) AS ( SELECT [DATE], 1+(SELECT COUNT(1) FROM Sales s2 WHERE s2.[DATE] > s1.[DATE] AND s2.[DATE] < NoMatchDate) FROM SalesNoMatch s1 WHERE s1.SALEDITEMS = @SALEDITEMS ) SELECT MAX(ConsecutiveCount) FROM SalesMatchCount;

Here is the DDL I used for testing, including the indexes you will need:

 CREATE TABLE [Sales]( [DATE] date NOT NULL, [SALEDITEMS] int NOT NULL ); CREATE UNIQUE CLUSTERED INDEX IX_Sales ON Sales ([DATE]); CREATE UNIQUE NONCLUSTERED INDEX IX_Sales2 ON Sales (SALEDITEMS, [DATE]);

And this is how I created my test data - 1000,001 rows with ascending dates with SALEDITEMS randomly set between 1 and 10.

 INSERT INTO Sales ([DATE], SALEDITEMS) VALUES ('1/1/09', 5) DECLARE @i int = 0; WHILE (@i < 1000000) BEGIN INSERT INTO Sales ([DATE], SALEDITEMS) SELECT DATEADD (d, 1, (SELECT MAX ([DATE]) FROM Sales)), ABS(CHECKSUM(NEWID())) % 10 + 1 SET @i = @i + 1; END

Here is the recursive CTE solution that I left: DECLARE @SALEDITEMS INT = 3;

 -- recursive CTE solution (remember to set MAXRECURSION!) WITH SalesRowNum ([DATE], SALEDITEMS, RowNum) AS ( SELECT [DATE], SALEDITEMS, ROW_NUMBER() OVER (ORDER BY s1.[DATE]) as RowNum FROM Sales s1 ) , SalesCTE (RowNum, [DATE], ConsecutiveCount) AS ( SELECT s1.RowNum, s1.[DATE], 1 AS ConsecutiveCount FROM SalesRowNum s1 WHERE SALEDITEMS = @SALEDITEMS UNION ALL SELECT s1.RowNum, s1.[DATE], ConsecutiveCount + 1 AS ConsecutiveCount FROM SalesRowNum s1 INNER JOIN SalesCTE s2 ON s1.RowNum = s2.RowNum + 1 WHERE SALEDITEMS = @SALEDITEMS ) SELECT MAX(ConsecutiveCount) FROM SalesCTE;

Find the longest sequence of values ​​in a table

More articles:

Find the longest sequence of values in a table