SQL Server query - not executing, as expected, not behaving the way I thought it would

I have an extended SQL query for your SQL gurus-gurus :-)

I'm currently trying to understand some behavior in a larger application, but it comes down to querying these two tables:

  • Users table - approximately 750 entries, UserId ( varchar(50) ) as a clustered PK
  • ActionLog table - millions of records, includes UserId - but not FK relationships

For the grid in my ASP.NET application, I am trying to get all users and their last logging date.

The SQL statement that is currently being used looks something like this:

 SELECT UserId, (other columns), LastLogDate = (SELECT TOP (1) [Timestamp] FROM dbo.ActionLog a WHERE a.UserId = u.UserId ORDER BY [Timestamp] DESC) FROM dbo.Users u; 

and it returns the displayed lines - but it is rather slow (about 20 seconds).

My first thought was to add an index to the ActionLog table in the UserId and include the Timestamp column in it:

 CREATE NONCLUSTERED INDEX [IDX_UserId] ON [dbo].[ActionLog]([UserId] ASC) INCLUDE ([Timestamp]) 

Lines are now returned very quickly - less than 2 seconds, with 350,000 entries in the ActionLog table, and my index is used just fine, as the execution plan shows. Everything seems beautiful.

Now, to approximate the production scenario, we loaded approximately 2 million rows into the ActionLog table, 95% or more of which belong to a non-existent user (i.e. these lines have a UserId that does not exist in the Users table).

Now, unexpectedly, the query becomes extremely slow (24 minutes!) And the index is no longer used.

I suggested that since the vast majority of the entries in the ActionLog table ActionLog not match the existing user, I would see a performance boost if I use a filtered index - to "filter out" all these messy entries without the corresponding user ", so I created this index (replacing another that existed before):

 CREATE NONCLUSTERED INDEX [IDX_UserId] ON [dbo].[Log]([UserId] ASC) INCLUDE ([Timestamp]) WHERE UserId <> 'user' -- that the fixed, non-existing "UserId" I wanted to avoid 

But to my horror - the request is still about the same - it takes more than 20 minutes. I updated the statistics - no change - still very slow.

Funny thing (for me): when I reset the index and recreate it β†’ now the query was really fast again (again, less than 3 seconds). WOW!

But as soon as I add more entries again, the request "bends" and becomes really very slow ........

I do not quite understand why this is happening. I thought that with a filtered index that eliminates all these rogue entries, I would see good performance when trying to find a new ActionLog entry for existing users - but that doesn't seem to be the case.

WHY NOT?

Any ideas? Thoughts? What to try?

+7
performance sql-server query-performance
source share
3 answers

First of all, INCLUDE is not the best choice here. You sort by input date, but included columns are not sorted. The best solution would be:

 CREATE NONCLUSTERED INDEX [IX_ActionLog_UserIdTimestamp] ON [dbo].[ActionLog] ([UserId], [Timestamp]); 

Secondly, it looks like you might need to update statistics on your index more often than automatic updates. I have seen cases where in a situation close to yours, I had to update statistics every 10 minutes due to excessive insertions. However, this was in 2005.

+3
source share

Try this query and see how it works with your source index or with the modified sentence suggested by @Roger Wolf:

 SELECT u.UserId, a.LastLogDate FROM dbo.Users u INNER JOIN ( SELECT UserId, Max([TimeStamp]) AS LastLogDate FROM dbo.ActionLog WHERE userid <> 'user' -- the user to filter out GROUP BY UserId ) a ON a.UserId = u.UserId 

If this sucks, I will delete the answer :)

+2
source share

Delete the subquery:

 SELECT u.UserId, Max(a.TimeStamp) As LastLogDate FROM dbo.Users u , dob.ActionLog a Where a.UserId = u.UserId Group By u.UserId; 

Then think about getting other columns.

-one
source share

All Articles