How can I improve the performance of this LINQ?

UPDATE thanks to @usr, I got this up to ~ 3 seconds just by changing

.Select( log => log.OrderByDescending( d => d.DateTimeUTC ).FirstOrDefault() ) 

to

  .Select( log => log.OrderByDescending( d => d.Id ).FirstOrDefault() ) 

I have a database with two tables - logs and collectors, which I use to read the Entity Framework. There are 86 collector entries, and each has 50,000+ corresponding log entries.

I want to get the most recent log entry for each collector, which is easily executed using this SQL

 SELECT CollectorLogModels_1.Status, CollectorLogModels_1.NumericValue, CollectorLogModels_1.StringValue, CollectorLogModels_1.DateTimeUTC, CollectorSettingsModels.Target, CollectorSettingsModels.TypeName FROM (SELECT CollectorId, MAX(Id) AS Id FROM CollectorLogModels GROUP BY CollectorId) AS RecentLogs INNER JOIN CollectorLogModels AS CollectorLogModels_1 ON RecentLogs.Id = CollectorLogModels_1.Id INNER JOIN CollectorSettingsModels ON CollectorLogModels_1.CollectorId = CollectorSettingsModels.Id 

It takes ~ 2 seconds.

The closest I could contact with LINQ is the following

 var logs = context.Logs.Include(co => co.Collector) .GroupBy( log => log.CollectorId, log => log ) .Select( log => log.OrderByDescending( d => d.DateTimeUtc ).FirstOrDefault() ) .Join( context.Collectors, (l => l.CollectorId), (c => c.Id), (l, c) => new { c.Target, DateTimeUTC = l.DateTimeUtc, l.Status, l.StringValue, CollectorName = c.TypeName } ).OrderBy( o => o.Target ).ThenBy( o => o.CollectorName ) ; 

This gives the results I want, but takes ~ 35 seconds to complete.

It becomes the next SQL

 SELECT [Distinct1].[CollectorId] AS [CollectorId], [Extent3].[Target] AS [Target], [Limit1].[DateTimeUtc] AS [DateTimeUtc], [Limit1].[Status] AS [Status], [Limit1].[StringValue] AS [StringValue], [Extent3].[TypeName] AS [TypeName] FROM (SELECT DISTINCT [Extent1].[CollectorId] AS [CollectorId] FROM [dbo].[CollectorLogModels] AS [Extent1] ) AS [Distinct1] OUTER APPLY (SELECT TOP (1) [Project2].[Status] AS [Status], [Project2].[StringValue] AS [StringValue], [Project2].[DateTimeUtc] AS [DateTimeUtc], [Project2].[CollectorId] AS [CollectorId] FROM ( SELECT [Extent2].[Status] AS [Status], [Extent2].[StringValue] AS [StringValue], [Extent2].[DateTimeUtc] AS [DateTimeUtc], [Extent2].[CollectorId] AS [CollectorId] FROM [dbo].[CollectorLogModels] AS [Extent2] WHERE [Distinct1].[CollectorId] = [Extent2].[CollectorId] ) AS [Project2] ORDER BY [Project2].[DateTimeUtc] DESC ) AS [Limit1] INNER JOIN [dbo].[CollectorSettingsModels] AS [Extent3] ON [Limit1].[CollectorId] = [Extent3].[Id] ORDER BY [Extent3].[Target] ASC, [Extent3].[TypeName] ASC 

How can I improve performance over what is possible with SQL?

+5
source share
2 answers

In the source SQL, you can select the DateTimeUTC collection from a different row than MAX (ID). This is probably a mistake. EF does not have this problem. This is not semantically identical, it is a more complex query.

If you rewrite an EF query structurally the same as an SQL query, you get identical performance. I don’t see anything here so that EF does not support.

Calculate max(id) using EF and join this.

0
source

I had the same problem, I decided to add indexes to it.

My request will take 45 seconds and I was able to complete it in less than a second.

0
source

All Articles