Calculate duration statistics between time data

Question

Calculate duration statistics between time data

I use SQL Server to store ticket validation data. One ticket can be confirmed in several places. I need to group records by "record" and "exit" and calculate statistics about the duration elapsed between the two checks. Here is a table (simplified for clarity):

CREATE TABLE TestDuration (VALIDATION_TIMESTAMP datetime, ID_TICKET bigint, ID_PLACE bigint)

And the data:

 INSERT INTO TestDuration(VALIDATION_TIMESTAMP,ID_TICKET,ID_PLACE) VALUES ('2012-07-25 19:24:05.700', 1, 1) INSERT INTO TestDuration(VALIDATION_TIMESTAMP,ID_TICKET,ID_PLACE) VALUES ('2012-07-25 20:08:04.250', 2, 2) INSERT INTO TestDuration(VALIDATION_TIMESTAMP,ID_TICKET,ID_PLACE) VALUES ('2012-07-26 10:18:13.040', 3, 3) INSERT INTO TestDuration(VALIDATION_TIMESTAMP,ID_TICKET,ID_PLACE) VALUES ('2012-07-26 10:18:20.990', 1, 2) INSERT INTO TestDuration(VALIDATION_TIMESTAMP,ID_TICKET,ID_PLACE) VALUES ('2012-07-26 10:18:29.290', 2, 4) INSERT INTO TestDuration(VALIDATION_TIMESTAMP,ID_TICKET,ID_PLACE) VALUES ('2012-07-26 10:25:37.040', 1, 4)

Here is the aggregation request:

 SELECT VisitDurationCalcTable.ID_PLACE AS ID_PLACE_IN, VisitDurationCalcTable.ID_NEXT_VISIT_PLACE AS ID_PLACE_OUT, COUNT(visitduration) AS NUMBER_OF_VISITS, AVG(visitduration) AS AVERAGE_VISIT_DURATION FROM ( SELECT EntryData.VALIDATION_TIMESTAMP, EntryData.ID_TICKET, EntryData.ID_PLACE, ( SELECT TOP 1 ID_PLACE FROM TestDuration WHERE ID_TICKET=EntryData.ID_TICKET AND VALIDATION_TIMESTAMP>EntryData.VALIDATION_TIMESTAMP ORDER BY VALIDATION_TIMESTAMP ASC ) AS ID_NEXT_VISIT_PLACE, DATEDIFF(n,EntryData.VALIDATION_TIMESTAMP, ( SELECT TOP 1 VALIDATION_TIMESTAMP FROM TestDuration WHERE ID_TICKET=EntryData.ID_TICKET and VALIDATION_TIMESTAMP>EntryData.VALIDATION_TIMESTAMP ORDER BY VALIDATION_TIMESTAMP ASC ) ) AS visitduration FROM TestDuration EntryData) AS VisitDurationCalcTable WHERE VisitDurationCalcTable.ID_NEXT_VISIT_PLACE IS NOT NULL GROUP BY VisitDurationCalcTable.ID_PLACE, VisitDurationCalcTable.ID_NEXT_VISIT_PLACE

The query works, but I quickly hit performance. For 40K rows, the query time of the table is about 3 minutes. I am not an SQL guru, so I can’t figure out how to convert a query to work faster. This is not a critical report and is done only once a month, but, nevertheless, it makes my application bad. I have the feeling that I missed something simple here.

+7

sql sql-server

Janeks bergs Dec 18 '12 at 15:40

source share

1 answer

Martin smith · Accepted Answer · 2012-12-18T15:55:24+0000

TL Version; DR

You are clearly missing an index that would help this query. Adding a missing index is likely to lead to an order of magnitude correction.

If you are rewriting a query on SQL Server 2012 using LEAD , it will also do this (although it will also benefit from the missing index).

If you are still in 2005/2008, then you can make some improvements to the existing query, but the effect will be relatively small compared to changing the index.

Longer version

For this to take 3 minutes, I assume that you don’t have any useful indexes at all, and that the biggest victory would be to simply add an index (to run a report once a month, just copying data from three columns to the appropriate indexing #temp may be enough if you do not want to create a constant index).

You say you simplified the table for clarity and had 40K rows. Assuming the following test data

 CREATE TABLE TestDuration ( Id UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY, VALIDATION_TIMESTAMP DATETIME, ID_TICKET BIGINT, ID_PLACE BIGINT, OtherColumns CHAR(100) NULL ) INSERT INTO TestDuration (VALIDATION_TIMESTAMP, ID_TICKET, ID_PLACE) SELECT TOP 40000 DATEADD(minute, ROW_NUMBER() OVER (ORDER BY (SELECT 0)), GETDATE()), ABS(CHECKSUM(NEWID())) % 10, ABS(CHECKSUM(NEWID())) % 100 FROM master..spt_values v1, master..spt_values v2

Your original request takes 51 seconds on my machine in MAXDOP 1 and the following IO parameters

 Table 'Worktable'. Scan count 79990, logical reads 1167573, physical reads 0 Table 'TestDuration'. Scan count 3, logical reads 2472, physical reads 0.

For each of the 40,000 rows in the table, it performs two kinds of all the corresponding ID_TICKET rows to identify the next one in VALIDATION_TIMESTAMP order

Just adding an index, as shown below, leads to the fact that the elapsed time is reduced to 406 ms, and more than 100 times (subsequent requests in this answer assume that this index is now in place).

 CREATE NONCLUSTERED INDEX IX ON TestDuration(ID_TICKET, VALIDATION_TIMESTAMP) INCLUDE (ID_PLACE)

Now the plan is as follows: 80,000 sorting and buffering operations are replaced by indexes.

 Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0 Table 'TestDuration'. Scan count 79991, logical reads 255707, physical reads 0

However, it still performs 2 calls on each line. Rewriting with CROSS APPLY allows you to combine them.

 SELECT VisitDurationCalcTable.ID_PLACE AS ID_PLACE_IN, VisitDurationCalcTable.ID_NEXT_VISIT_PLACE AS ID_PLACE_OUT, COUNT(visitduration) AS NUMBER_OF_VISITS, AVG(visitduration) AS AVERAGE_VISIT_DURATION FROM (SELECT EntryData.VALIDATION_TIMESTAMP, EntryData.ID_TICKET, EntryData.ID_PLACE, CA.ID_PLACE AS ID_NEXT_VISIT_PLACE, DATEDIFF(n, EntryData.VALIDATION_TIMESTAMP, CA.VALIDATION_TIMESTAMP) AS visitduration FROM TestDuration EntryData CROSS APPLY (SELECT TOP 1 ID_PLACE, VALIDATION_TIMESTAMP FROM TestDuration WHERE ID_TICKET = EntryData.ID_TICKET AND VALIDATION_TIMESTAMP > EntryData.VALIDATION_TIMESTAMP ORDER BY VALIDATION_TIMESTAMP ASC) CA) AS VisitDurationCalcTable GROUP BY VisitDurationCalcTable.ID_PLACE, VisitDurationCalcTable.ID_NEXT_VISIT_PLACE

It gives me an elapsed time of 269 ms

 Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0 Table 'TestDuration'. Scan count 40001, logical reads 127988, physical reads 0

While the number of reads is still quite large, searches are all the read pages that were just read during the scan, so all pages are cached. The number of readings can be reduced using a table variable.

 DECLARE @T TABLE ( VALIDATION_TIMESTAMP DATETIME, ID_TICKET BIGINT, ID_PLACE BIGINT, RN INT PRIMARY KEY(ID_TICKET, RN) ) INSERT INTO @T SELECT VALIDATION_TIMESTAMP, ID_TICKET, ID_PLACE, ROW_NUMBER() OVER (PARTITION BY ID_TICKET ORDER BY VALIDATION_TIMESTAMP) AS RN FROM TestDuration SELECT T1.ID_PLACE AS ID_PLACE_IN, T2.ID_PLACE AS ID_PLACE_OUT, COUNT(*) AS NUMBER_OF_VISITS, AVG(DATEDIFF(n, T1.VALIDATION_TIMESTAMP, T2.VALIDATION_TIMESTAMP)) AS AVERAGE_VISIT_DURATION FROM @T T1 INNER MERGE JOIN @T T2 ON T1.ID_TICKET = T2.ID_TICKET AND T2.RN = T1.RN + 1 GROUP BY T1.ID_PLACE, T2.ID_PLACE

However, for me, at least it slightly increased the elapsed time to 301 ms (43 ms for insertion + 258 ms for choice), but it can still be a good option instead of creating a constant index.

 (Insert) Table 'TestDuration'. Scan count 1, logical reads 233, physical reads 0 (Select) Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0 Table '#0C50D423'. Scan count 2, logical reads 372, physical reads 0

Finally, if you are using SQL Server 2012, you can use LEAD ( SQL Fiddle )

 WITH CTE AS (SELECT ID_PLACE AS ID_PLACE_IN, LEAD(ID_PLACE) OVER (PARTITION BY ID_TICKET ORDER BY VALIDATION_TIMESTAMP) AS ID_PLACE_OUT, DATEDIFF(n, VALIDATION_TIMESTAMP, LEAD(VALIDATION_TIMESTAMP) OVER (PARTITION BY ID_TICKET ORDER BY VALIDATION_TIMESTAMP)) AS VISIT_DURATION FROM TestDuration) SELECT ID_PLACE_IN, ID_PLACE_OUT, COUNT(*) AS NUMBER_OF_VISITS, AVG(VISIT_DURATION) AS AVERAGE_VISIT_DURATION FROM CTE WHERE ID_PLACE_OUT IS NOT NULL GROUP BY ID_PLACE_IN, ID_PLACE_OUT

It gave me a time elapsed after 249 ms

 Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0 Table 'TestDuration'. Scan count 1, logical reads 233, physical reads 0

The LEAD version also works well without an index. The omission of the optimal index adds an additional SORT plan to the plan and means that it should read the wider cluster index in my test table, but it is still completed in the elapsed time of 293 ms.

 Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0 Table 'TestDuration'. Scan count 1, logical reads 824, physical reads 0

Calculate duration statistics between time data

More articles: