SQL Server Index Pointer (Date and Time Field)

I have a question about SQL Server indexes. I am not a database administrator and I assume that the answer will be clear to those of you who are. I am using SQL Server 2008.

I have a table that looks like the following (but has more columns):

CREATE TABLE [dbo].[Results]( [ResultID] [int] IDENTITY(1,1) NOT NULL, [TypeID] [int] NOT NULL, [ItemID] [int] NOT NULL, [QueryTime] [datetime] NOT NULL, [ResultTypeID] [int] NOT NULL, [QueryDay] AS (datepart(day,[querytime])) PERSISTED, [QueryMonth] AS (datepart(month,[querytime])) PERSISTED, [QueryYear] AS (datepart(year,[querytime])) PERSISTED, CONSTRAINT [PK_Results] PRIMARY KEY CLUSTERED ( [ResultID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY] 

The important fields here are ResultID, primary key and QueryTime - the time and time at which the result was received.

I also have the following index (among others):

 CREATE NONCLUSTERED INDEX [IDX_ResultDate] ON [dbo].[Results] ( [QueryTime] ASC ) INCLUDE ( [ResultID], [ItemID], [TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] 

In a database where I have about a million rows in the table, the index is used when executing the query, for example:

 select top 1 * from results where querytime>'2009-05-01' order by ResultID asc 

In another instance of the same database with 50 million rows, SQL Server decides not to use the index, because it rather scans the clustered index, which ends up being terribly slow. (and speed depends on the date). Even if I use tooltips to request that it use IDX_ResultDate, it is still a bit slow and it spends 94% of the time sorting by ResultID. I realized that by creating an index with both ResultID and QueryTime as sorted columns in the index, I could speed up my query.

So I created the following:

 CREATE NONCLUSTERED INDEX [IDX_ResultDate2] ON [dbo].[Results] ( [QueryTime] ASC, [ResultID] ASC ) INCLUDE ( [ItemID], [TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] GO 

I assumed that I would first use the QueryTime collation to find the corresponding results that will already be sorted by ResultID. However, this is not so, since this index does not change anything in performance compared to the existing one.

Then I tried the following index:

 CREATE NONCLUSTERED INDEX [IDX_ResultDate3] ON [dbo].[Results] ( [ResultID] ASC, [QueryTime] ASC ) INCLUDE ( [ItemID], [TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] GO 

This result gives the expected result. It seems to be returning at a constant time (split second).

However, I am puzzled by why IDX_ResultDate3 works well, while IDX_ResultDate2 does not.

I would suggest that a binary search in the form of a sorted QueryTime list, followed by peeking at the first result in it, a child list of ResultID, is the fastest way to get the result. (Hence my initial sort order).

Side question: should I create a persistent column with a QueryTime date part and an index on it (I already have three persistent columns, as you can see above)?

+6
database sql-server indexing
source share
5 answers

I would suggest that binary search in as a sorted QueryTime list by looking at the first result in the ResultID child list is the fastest way to get the result. (Hence my initial sort order).

It would be really fast, but your request expresses another request: you request a result with a minimum ResultId of all requests arising after "2009-05-01" . To satisfy the request, it should search at the beginning of the range ('2009-05-01'), start scanning from this position to extract all ResultId, sort them, and then return the top 1 (minimum ResultId). The second index [idx_ResultDate2] you added does not help either. The query should do almost the same thing as searching and scanning: ResultIds are sorted by the date of the result, so to find the top result from all the results that were after "2009-05-01", the query still has to scan the index to the end.

In the last index [IDX_ResultDate3], the request is cheating. What he does is he runs a check on inde and looks at the value of QueryTime, knowing that in this index scan the first result that has QueryTime in the desired range (> '2009-05-01') is the one you want (because ResultId is guaranteed to be Top 1). You get the result in a "split second" from pure luck: you have the corresponding result at the beginning of the index. A query can scan the entire index well and match the lat result itself. You can insert a new result with QueryTime, for example '2010-01-01', and then search for it, you will see that the performance is deteriorating, because the query should scan the entire index to the end (still faster than scanning a table, a narrower size index).

My question is: are you absolutely sure that your request should return TOP 1 to ORDER BY ResultID? Or did you just choose the order arbitrarily? If you can modify the ORDER BY query, say QueryTime, then any of the indexes ( updated : with QueryTime as the leftmost column) will return a simple search and selection, no scans and sorting.

+12
source share

You have a filter condition for a range on one field along with ORDER BY another field.

An index, even a composite index, cannot be used to serve both conditions in this case.

When you create an index on (queryTime, resultId) , the index is used for filtering. The engine still needs to order a result set.

When you create an index on (resultId, queryTime) , the index is used for ordering.

Since you need the result of TOP 1 , and the row that satisfies this result is at the beginning of the index, the latter approach is better.

If your filter condition is selective (that is, it returns a few rows), and the first result you need is at the end of the index, the first approach would be better.

See this blog post for more details and guidance on creating an index under what conditions:

+4
source share

You can change the clustered index to [[QueryTime], [ResultID]) or change the query from

 select top 1 * from results where querytime>'2009-05-01' order by ResultID asc 

to

 select top 1 <only the columns you actually need> from results where querytime>'2009-05-01' order by ResultID asc 

and include all those columns in [IDX_ResultDate2]

+2
source share

The first thing I would suggest is to check if the statistics for this table (all indexes) are updated.

Since you get two different execution plans with different datasets, it seems that SQL Server makes the infamous โ€œcourt callโ€ when choosing one execution plan over another.

I agree with Remus's explanation of why you get โ€œmagicโ€ results with the latest index.

His offer is also good - do you really want to order by resultID? Or, if you can order by request of Time, then you will have GREAT performance because the execution plan will be able to use the index order as the order of the result set (And it will look for the index as well as the scan).

0
source share

I'm not sure I can answer the question, but I will point out that the clustered index key is already included as part of any other index, so its redundancy includes ResultID as part of any of the other indexes you proposed.

0
source share

All Articles