Trying to optimize a query that selects "approximate nearest record",

I have a table containing a lot of data, where we especially care about the field date. The reason for this is that the amount of data has only risen ~ 30 times, and old methods will soon decay. A query I hope can help me optimize:

  • take a list of dates (generated by a table function based on cte)
  • get one record for each of these dates
    • based on some definition of "closest"

For example, the current table contains data after 5 seconds (+/- few) intervals. I need to try this table and get a record that will be closest to the 30 second interval.

What I have now works great. I'm just wondering if there is a way to optimize it more. If I can do this in Linq To SQL, that will be neat too. I'm even interested in index suggestions given the number of date values ​​(~ 2 million rows of mines).

declare @st  datetime ; set @st  = '2012-01-31 05:05:00';
declare @end datetime ; set @end = '2012-01-31 05:10:00';

select distinct
    log.*   -- id, 
from 
    dbo.fn_GenerateDateSteps(@st, @end, 30) as d
        inner join lotsOfLogData log on l.Id = (
            select top 1 e.[Id]
            from 
                lotsOfLogData as log  -- contains data in 5 second intervals
            where
                log.stationId = 1000 
                -- search for dates in a certain range
                AND utcTime between DateAdd(s, -10, dt) AND DateAdd(s, 5, dt)
            order by
                -- get the 'closest'. this can change a little, but will always 
                -- be based on a difference between the date
                abs(datediff(s, dt, UtcTime)) 
        )
    -- updated the query to be correct. stadionId should be inside the subquery

The structure of the lotsOfLogData table is shown below. There are a relatively small number of station identifiers (maybe 50), but there are many entries for each entry. We know the identifier of the station upon request.

create table ##lotsOfLogData (
    Id          bigint      identity(1,1) not null
,   StationId   int         not null
,   UtcTime     datetime    not null
    -- 20 other fields, used for other calculations
)

fn_GenerateDateSteps returns the following data set for the specified parameters:

[DT]
2012-01-31 05:05:00.000
2012-01-31 05:05:30.000
2012-01-31 05:06:00.000
2012-01-31 05:06:30.000  (and so on, every 30 seconds)

I did it with a temporary table this way, but it turned out a little more expensive.

declare @dates table ( dt datetime, ClosestId bigint); 
insert into @dates (dt) select dt from dbo.fn_GenerateDateSteps(@st, @end, 30)
update @dates set closestId = ( -- same subquery as above )
select * from lotsOfLogData inner join @dates on Id = ClosestId

Edit: Fixed

200K + . , (id/time + includes (.. ...) . , , ( ) index [id + time]. - . , , : D

-- subtree cost (crossapply) : .0808
-- subtree cost (id based)   : .0797

-- see above query for what i ended up with
+5
2

  • inner join cross apply.
  • where log.stationid .

SQL

SELECT  DISTINCT log.*   -- id, 
FROM    dbo.fn_GenerateDateSteps(@st, @end, 30) AS d
        CROSS APPLY (
            SELECT  TOP 1 log.*
            FROM    lotsOfLogData AS log  -- contains data in 5 second intervals
            WHERE   -- search for dates in a certain range
                    utcTime between DATEADD(s, -10, d.dt) AND DATEADD(s, 5, d.dt)
                    AND log.stationid = 1000
            ORDER BY
                    -- get the 'closest'. this can change a little, but will always 
                    -- be based on a difference between the date
                    ABS(DATEDIFF(s, d.dt, UtcTime)) 
        ) log
+1

... , .

, , .

: (, , / )? , SQL Server , , DateAdd() DateDiff().

( ) , SQL Server . , DATETIME, .

, , , , , . , , , .

, , , . . , , , .

, SQL ( , 2005 , , 2008 ) ? , , .

+1

All Articles