UPDATE thanks to @usr, I got this up to ~ 3 seconds just by changing
.Select( log => log.OrderByDescending( d => d.DateTimeUTC ).FirstOrDefault() )
to
.Select( log => log.OrderByDescending( d => d.Id ).FirstOrDefault() )
I have a database with two tables - logs and collectors, which I use to read the Entity Framework. There are 86 collector entries, and each has 50,000+ corresponding log entries.
I want to get the most recent log entry for each collector, which is easily executed using this SQL
SELECT CollectorLogModels_1.Status, CollectorLogModels_1.NumericValue, CollectorLogModels_1.StringValue, CollectorLogModels_1.DateTimeUTC, CollectorSettingsModels.Target, CollectorSettingsModels.TypeName FROM (SELECT CollectorId, MAX(Id) AS Id FROM CollectorLogModels GROUP BY CollectorId) AS RecentLogs INNER JOIN CollectorLogModels AS CollectorLogModels_1 ON RecentLogs.Id = CollectorLogModels_1.Id INNER JOIN CollectorSettingsModels ON CollectorLogModels_1.CollectorId = CollectorSettingsModels.Id
It takes ~ 2 seconds.
The closest I could contact with LINQ is the following
var logs = context.Logs.Include(co => co.Collector) .GroupBy( log => log.CollectorId, log => log ) .Select( log => log.OrderByDescending( d => d.DateTimeUtc ).FirstOrDefault() ) .Join( context.Collectors, (l => l.CollectorId), (c => c.Id), (l, c) => new { c.Target, DateTimeUTC = l.DateTimeUtc, l.Status, l.StringValue, CollectorName = c.TypeName } ).OrderBy( o => o.Target ).ThenBy( o => o.CollectorName ) ;
This gives the results I want, but takes ~ 35 seconds to complete.
It becomes the next SQL
SELECT [Distinct1].[CollectorId] AS [CollectorId], [Extent3].[Target] AS [Target], [Limit1].[DateTimeUtc] AS [DateTimeUtc], [Limit1].[Status] AS [Status], [Limit1].[StringValue] AS [StringValue], [Extent3].[TypeName] AS [TypeName] FROM (SELECT DISTINCT [Extent1].[CollectorId] AS [CollectorId] FROM [dbo].[CollectorLogModels] AS [Extent1] ) AS [Distinct1] OUTER APPLY (SELECT TOP (1) [Project2].[Status] AS [Status], [Project2].[StringValue] AS [StringValue], [Project2].[DateTimeUtc] AS [DateTimeUtc], [Project2].[CollectorId] AS [CollectorId] FROM ( SELECT [Extent2].[Status] AS [Status], [Extent2].[StringValue] AS [StringValue], [Extent2].[DateTimeUtc] AS [DateTimeUtc], [Extent2].[CollectorId] AS [CollectorId] FROM [dbo].[CollectorLogModels] AS [Extent2] WHERE [Distinct1].[CollectorId] = [Extent2].[CollectorId] ) AS [Project2] ORDER BY [Project2].[DateTimeUtc] DESC ) AS [Limit1] INNER JOIN [dbo].[CollectorSettingsModels] AS [Extent3] ON [Limit1].[CollectorId] = [Extent3].[Id] ORDER BY [Extent3].[Target] ASC, [Extent3].[TypeName] ASC
How can I improve performance over what is possible with SQL?