I have a question about SQL Server indexes. I am not a database administrator and I assume that the answer will be clear to those of you who are. I am using SQL Server 2008.
I have a table that looks like the following (but has more columns):
CREATE TABLE [dbo].[Results]( [ResultID] [int] IDENTITY(1,1) NOT NULL, [TypeID] [int] NOT NULL, [ItemID] [int] NOT NULL, [QueryTime] [datetime] NOT NULL, [ResultTypeID] [int] NOT NULL, [QueryDay] AS (datepart(day,[querytime])) PERSISTED, [QueryMonth] AS (datepart(month,[querytime])) PERSISTED, [QueryYear] AS (datepart(year,[querytime])) PERSISTED, CONSTRAINT [PK_Results] PRIMARY KEY CLUSTERED ( [ResultID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY]
The important fields here are ResultID, primary key and QueryTime - the time and time at which the result was received.
I also have the following index (among others):
CREATE NONCLUSTERED INDEX [IDX_ResultDate] ON [dbo].[Results] ( [QueryTime] ASC ) INCLUDE ( [ResultID], [ItemID], [TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
In a database where I have about a million rows in the table, the index is used when executing the query, for example:
select top 1 * from results where querytime>'2009-05-01' order by ResultID asc
In another instance of the same database with 50 million rows, SQL Server decides not to use the index, because it rather scans the clustered index, which ends up being terribly slow. (and speed depends on the date). Even if I use tooltips to request that it use IDX_ResultDate, it is still a bit slow and it spends 94% of the time sorting by ResultID. I realized that by creating an index with both ResultID and QueryTime as sorted columns in the index, I could speed up my query.
So I created the following:
CREATE NONCLUSTERED INDEX [IDX_ResultDate2] ON [dbo].[Results] ( [QueryTime] ASC, [ResultID] ASC ) INCLUDE ( [ItemID], [TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] GO
I assumed that I would first use the QueryTime collation to find the corresponding results that will already be sorted by ResultID. However, this is not so, since this index does not change anything in performance compared to the existing one.
Then I tried the following index:
CREATE NONCLUSTERED INDEX [IDX_ResultDate3] ON [dbo].[Results] ( [ResultID] ASC, [QueryTime] ASC ) INCLUDE ( [ItemID], [TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] GO
This result gives the expected result. It seems to be returning at a constant time (split second).
However, I am puzzled by why IDX_ResultDate3 works well, while IDX_ResultDate2 does not.
I would suggest that a binary search in the form of a sorted QueryTime list, followed by peeking at the first result in it, a child list of ResultID, is the fastest way to get the result. (Hence my initial sort order).
Side question: should I create a persistent column with a QueryTime date part and an index on it (I already have three persistent columns, as you can see above)?