I have an extended SQL query for your SQL gurus-gurus :-)
I'm currently trying to understand some behavior in a larger application, but it comes down to querying these two tables:
Users table - approximately 750 entries, UserId ( varchar(50) ) as a clustered PKActionLog table - millions of records, includes UserId - but not FK relationships
For the grid in my ASP.NET application, I am trying to get all users and their last logging date.
The SQL statement that is currently being used looks something like this:
SELECT UserId, (other columns), LastLogDate = (SELECT TOP (1) [Timestamp] FROM dbo.ActionLog a WHERE a.UserId = u.UserId ORDER BY [Timestamp] DESC) FROM dbo.Users u;
and it returns the displayed lines - but it is rather slow (about 20 seconds).
My first thought was to add an index to the ActionLog table in the UserId and include the Timestamp column in it:
CREATE NONCLUSTERED INDEX [IDX_UserId] ON [dbo].[ActionLog]([UserId] ASC) INCLUDE ([Timestamp])
Lines are now returned very quickly - less than 2 seconds, with 350,000 entries in the ActionLog table, and my index is used just fine, as the execution plan shows. Everything seems beautiful.
Now, to approximate the production scenario, we loaded approximately 2 million rows into the ActionLog table, 95% or more of which belong to a non-existent user (i.e. these lines have a UserId that does not exist in the Users table).
Now, unexpectedly, the query becomes extremely slow (24 minutes!) And the index is no longer used.
I suggested that since the vast majority of the entries in the ActionLog table ActionLog not match the existing user, I would see a performance boost if I use a filtered index - to "filter out" all these messy entries without the corresponding user ", so I created this index (replacing another that existed before):
CREATE NONCLUSTERED INDEX [IDX_UserId] ON [dbo].[Log]([UserId] ASC) INCLUDE ([Timestamp]) WHERE UserId <> 'user' -- that the fixed, non-existing "UserId" I wanted to avoid
But to my horror - the request is still about the same - it takes more than 20 minutes. I updated the statistics - no change - still very slow.
Funny thing (for me): when I reset the index and recreate it β now the query was really fast again (again, less than 3 seconds). WOW!
But as soon as I add more entries again, the request "bends" and becomes really very slow ........
I do not quite understand why this is happening. I thought that with a filtered index that eliminates all these rogue entries, I would see good performance when trying to find a new ActionLog entry for existing users - but that doesn't seem to be the case.
WHY NOT?
Any ideas? Thoughts? What to try?