The fastest way for this query (which is the best strategy) given the date range

Question

The fastest way for this query (which is the best strategy) given the date range

I have a table A that has startDate and end dateDate as 2 datetime columns, in addition to some other columns. I have another table B in which one datetime column calls a date column. This is in SQL Server 2005.

Here's the question: what is the best way to tune indexes, etc. to get the following:

select .... from A , B where A.startDate >= B.dates and A.endDate < B.dates

Both tables have several thousand records.

+7

sql-server tsql sql-server-2005

eschlech Aug 05 '09 at 11:52

source share

10 answers

Quassnoi · Answer 1 · 2009-08-05T17:57:54+0000

Update:

See this blog post for an effective indexing strategy for your query using computed columns:

Effective Date Range Query: SQL Server

The basic idea is that we simply compute the rounded length and startDate for the ranges, and then look for them using equality conditions (which are good for B-Tree indices)

In MySQL and SQL Server 2008 you can use SPATIAL ( R-Tree ) indexes.

They are especially good for conditions such as "select all records with a given point within the range of records", which is your business.

You save start_date and end_date as the beginning and end of a LineString (converting them to UNIX timestamps of a different numeric value), index them with the SPATIAL index and find all such a LineString whose minimum bounding box ( MBR ) contains the date value in question using MBRContains .

See this blog post on how to do this in MySQL :

Overlap Ranges: MySQL

and performance overview for SQL Server :

Overlap Ranges: SQL Server

The same solution can be applied to search for a given IP by the network ranges stored in the database.

This task, along with your request, is another frequently used example of such a condition.

Regular B-Tree indexes are not good if ranges can overlap.

If they can't (and you know it), you can use the brilliant solution suggested by @AlexKuznetsov

Also note that the performance of this request is completely dependent on your data distribution.

If you have many entries in B and several entries in A , you can simply build the index on B.dates and let TS/CIS go to A

This query will always read all rows from A and will use Index Seek in B.dates in a nested loop.

If your data is distributed in another way, i. e. you have many rows in A , but few in B , and the ranges are usually short, then you can slightly change the design of the tables:

 A start_date interval_length

create a composite index on A (interval_length, start_date)

and use this query:

 SELECT * FROM ( SELECT DISTINCT interval_length FROM a ) ai CROSS JOIN b JOIN a ON a.interval_length = ai.interval_length AND a.start_date BETWEEN b.date - ai.interval_length AND b.date

Ian ringrose · Answer 2 · 2009-08-05T13:06:23+0000

I worked in two companies (both with time management and attendance systems), which have many times with startDate and endDate columns. In my experience there are no good indexes that always work with date ranges.

Try indexes like (startDate, -endDate) and (-endDate, startDate) to see if they help, a lot depends on what the data in the table looks like . For example, if you have many old lines with endDate before the dates you are looking for, forcing using Sql to use an index based on (endDate, startDate) may help.

Also try using an index that spans all the columns that are contained in the where statement, so sql doesn't need to read the main table until it works out which rows to return.

You may need to use index hints, since it is unlikely that the query processor knows the data well enough to make a good choice of indexes - this is one of the very few times that I have had to deal with index hints.

The data extension, so you may need a table containing (date, row) with a row for each date in the date range. However, updating the index table is a pain.

If you know that some of your date ranges do not overlap, see Using CROSS APPLY to optimize joins in an INTERNATIONAL PROGRAM (For example, employee case records cannot overlap)

At the end of the day, if you have only a few thousand records, a full table scan is not bad.

Quassnoi entities using SPATIAL indexes , I have no experience with the “abuse” of spatial indexes in this way, but I think it's worth a try. However, be very careful if you have to offer multiple database providers each time, since the spatial index is fairly new. You may also need date columns for reporting tools, etc.

(Sooner or later, you will need to find all rows that span the date range, then it becomes even harder to get indexes that return good results.)

AK · Answer 3 · 2009-08-05T13:24:35+0000

useful link: Using CROSS APPLY to optimize joins in an INTERNATIONAL WORK

IordanTanev · Answer 4 · 2009-08-05T11:57:48+0000

each version of SQL Server 2000, 2005, 2008 has a program called DataBase Tuning Advisor, when you run a query, it tells you which indexes you need to add in order to get the query faster Best regards, Jordan

Dewfy · Answer 5 · 2009-08-05T12:02:01+0000

You need 3 indexes A.startDate, B.dates and A.endDate, maybe the index (A.endDate + A.startDate) is also good. I have no data on other columns and goals for these tables, but consider using a clustered index.

In any case, use the Execution option to decide between all of these options, because my suggestion is too general

Scoregraphic · Answer 6 · 2009-08-05T12:02:57+0000

The following script lists possible missing indexes (you can filter the statement by t.name).

 SELECT t.name AS 'affected_table', 'Create NonClustered Index IX_' + t.name + '_missing_' + CAST(ddmid.index_handle AS VARCHAR(10)) + ' On ' + ddmid.STATEMENT + ' (' + ISNULL(ddmid.equality_columns, '') + CASE WHEN ddmid.equality_columns IS NOT NULL AND ddmid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL(ddmid.inequality_columns, '') + ')' + ISNULL(' Include (' + ddmid.included_columns + ');', ';') AS sql_statement, ddmigs.user_seeks, ddmigs.user_scans, CAST((ddmigs.user_seeks + ddmigs.user_scans) * ddmigs.avg_user_impact AS INT) AS 'est_impact', ddmigs.last_user_seek FROM sys.dm_db_missing_index_groups AS ddmig INNER JOIN sys.dm_db_missing_index_group_stats AS ddmigs ON ddmigs.group_handle = ddmig.index_group_handle INNER JOIN sys.dm_db_missing_index_details AS ddmid ON ddmig.index_handle = ddmid.index_handle INNER JOIN sys.tables AS t ON ddmid.OBJECT_ID = t.OBJECT_ID WHERE ddmid.database_id = DB_ID() AND CAST((ddmigs.user_seeks + ddmigs.user_scans) * ddmigs.avg_user_impact AS INT) > 100 ORDER BY CAST((ddmigs.user_seeks + ddmigs.user_scans) * ddmigs.avg_user_impact AS INT) DESC;

ongle · Answer 7 · 2009-08-05T12:40:40+0000

It takes a little more information. How many other columns are in the tables? Are these existing tables with a large number of queries that already go against them, or are all new tables? What performance problem do you see that makes you ask a question?

I assume that all three columns are NOT NULL (not only for query syntax, but also for index usefulness).

I would start with a composite index on A.startDate + A.endDate and another index on B.dates (but this is most likely not needed). If these dates are not the main purpose of the tables, I would not create clustered indexes on these columns. This is doubly true if these tables are existing tables with other requested queries. Previous queries may be written in anticipation of existing clustered indexes.

RBarryYoung · Answer 8 · 2009-08-05T17:48:08+0000

I would go with this

 CREATE CLUSTERED INDEX IX_DateRange ON dbo.A ( StartDate, EndDate DESC ) GO

Jon · Answer 9 · 2009-08-05T12:20:18+0000

I would just add a clustered index to B.dates. If you add indexes to startDate and endDate, it won’t buy anything, because in any case you will get index checks on A. A clustered index on B gives you an index search on B at least. Table scans and index scans are the same thing, so it makes no sense to add indexes to get the word "Table Scan" from your execution plan :)

I would mock this in several ways, or see if you can repeat your query so that you do not need to scan the table to A, which I suppose is actually impossible.

ba__friend · Answer 10 · 2009-08-05T11:55:06+0000

If you need to optimize your attempt to run this query in Query Analyzer.

The fastest way for this query (which is the best strategy) given the date range

More articles: