How to execute LINQ GroupBy, Select, and then Take without increasing performance?

Background

I have an SQL dataset that is called as a view through LINQ-to-Entities. The goal is to provide outstanding balances on the credit report account, which is 30 days, with the exception of 60 days, etc.

Providing you with an example table is too difficult to format here in StackOverflow, but here is an SQL SELECT statement that should give you an idea of ​​the original data structure:

SELECT TOP 1000 [TransactionId] ,[IndustrySector] ,[DataContributorId] ,[ExperienceMonth] ,[ExperienceMonthText] ,[Balance] ,[ARCurrent] ,[AR1to30PD] ,[AR31to60PD] ,[AR61to90PD] ,[Ar91PlusPD] ,[WeightedDTP] FROM [BCC].[dbo].[vwTransactionExperienceDetail] 

Now, when I call this view via LINQ, the ultimate goal is to create an object that will be returned as JSON to the requesting client. The resulting object should be a hierarchy of Industry groups, and then Contributors (from the data presented) and, finally, a separate Reports . To do this, the following LINQ query works fine and pretty fast:

  /// <summary> /// Gets the 25 month experience detail report with summed parameters (balance, DTP, etc). /// </summary> /// <param name="id">The transaction id.</param> /// <returns>List&lt;ExperienceDetail&gt;</returns> public static List<ExperienceDetail> Get25MonthExperienceDetail_Sum(int id) { var db = new BCCEntities(); return db.vwTransactionExperienceDetails.Where(te => te.TransactionId == id) .GroupBy(g => g.IndustrySector) .Select(i => new ExperienceDetail { Industry = i.Key, NumberOfContributors = i.GroupBy(c => c.DataContributorId).Count(), Balance = i.Sum(s => s.Balance), OneToThirty = i.Sum(s => s.ARCurrent), ThirtyOneToSixty = i.Sum(s => s.AR1to30PD), SixtyOneToNinety = i.Sum(s => s.AR31to60PD), NinetyOneToOneTwenty = i.Sum(s => s.AR61to90PD), OneTwentyOnePlus = i.Sum(s => s.Ar91PlusPD), DTP = (i.Sum(s => s.Balance) != 0) ? i.Sum(s => s.WeightedDTP) / i.Sum(s => s.Balance) : i.Sum(s => s.WeightedDTP), Contributions = i.GroupBy(dc => dc.DataContributorId).Select(c => new Contribution { Balance = c.Sum(s => s.Balance), OneToThirty = c.Sum(s => s.ARCurrent), ThirtyOneToSixty = c.Sum(s => s.AR1to30PD), SixtyOneToNinety = c.Sum(s => s.AR31to60PD), NinetyOneToOneTwenty = c.Sum(s => s.AR61to90PD), OneTwentyOnePlus = c.Sum(s => s.Ar91PlusPD), DTP = (c.Sum(s => s.Balance) != 0) ? c.Sum(s => s.WeightedDTP) / c.Sum(s => s.Balance) : c.Sum(s => s.WeightedDTP), ContributorId = c.Key, Reports = c.Select(r => new Report { DTP = (r.Balance != 0) ? r.WeightedDTP/r.Balance : r.WeightedDTP, ReportDate = r.ExperienceMonth, Balance = r.Balance, OneToThirty = r.ARCurrent, ThirtyOneToSixty = r.AR1to30PD, SixtyOneToNinety = r.AR31to60PD, NinetyOneToOneTwenty = r.AR61to90PD, OneTwentyOnePlus = r.Ar91PlusPD, ContributorId = r.DataContributorId, Industry = i.Key }) }) }).ToList(); } 

Problem

I need to create an additional service that provides the same data, but only for the last month, reported by each contributor ( DataContributorId ). The following LINQ query works for this, but EXTREMELY is slow slow - it takes almost a full minute to return the results:

  /// <summary> /// Gets an experience detail report with summed parameters (balance, DTP, etc) for the most recent month. /// </summary> /// <param name="id">The transaction id.</param> /// <returns>List&lt;ExperienceDetail&gt;</returns> public static List<ExperienceDetail> Get25MonthExperienceDetail_MostRecentMonth(int id) { var db = new BCCEntities(); db.CommandTimeout = 100000; return db.vwTransactionExperienceDetails.Where(te => te.TransactionId == id) .OrderByDescending(o => o.ExperienceMonth) .GroupBy(g => g.IndustrySector) .Select(i => new ExperienceDetail { Industry = i.Key, NumberOfContributors = i.GroupBy(c => c.DataContributorId).Count(), Balance = i.GroupBy(dc => dc.DataContributorId).Sum(x => x.Select(z => z.Balance).FirstOrDefault()), OneToThirty = i.Sum(s => s.ARCurrent), ThirtyOneToSixty = i.Sum(s => s.AR1to30PD), SixtyOneToNinety = i.Sum(s => s.AR31to60PD), NinetyOneToOneTwenty = i.Sum(s => s.AR61to90PD), OneTwentyOnePlus = i.Sum(s => s.Ar91PlusPD), DTP = (i.Sum(s => s.Balance) != 0) ? i.Sum(s => s.WeightedDTP) / i.Sum(s => s.Balance) : i.Sum(s => s.WeightedDTP), Contributions = i.GroupBy(dc => dc.DataContributorId).Select(c => new Contribution { Balance = c.Take(1).Sum(s => s.Balance), OneToThirty = c.Take(1).Sum(s => s.ARCurrent), ThirtyOneToSixty = c.Take(1).Sum(s => s.AR1to30PD), SixtyOneToNinety = c.Take(1).Sum(s => s.AR31to60PD), NinetyOneToOneTwenty = c.Take(1).Sum(s => s.AR61to90PD), OneTwentyOnePlus = c.Take(1).Sum(s => s.Ar91PlusPD), DTP = (c.Take(1).Sum(s => s.Balance) != 0) ? c.Take(1).Sum(s => s.WeightedDTP) / c.Take(1).Sum(s => s.Balance) : c.Take(1).Sum(s => s.WeightedDTP), ContributorId = c.Key, Reports = c.Select(r => new Report { DTP = (r.Balance != 0) ? r.WeightedDTP / r.Balance : r.WeightedDTP, ReportDate = r.ExperienceMonth, Balance = r.Balance, OneToThirty = r.ARCurrent, ThirtyOneToSixty = r.AR1to30PD, SixtyOneToNinety = r.AR31to60PD, NinetyOneToOneTwenty = r.AR61to90PD, OneTwentyOnePlus = r.Ar91PlusPD, ContributorId = r.DataContributorId, Industry = i.Key }).Take(1) }) }).ToList(); } 

Question

How can I execute a query for the "Last Month" result set without losing performance? Over the past few hours, I have tried to isolate the part of the request that takes the most time, and I cannot notice it. Admittedly, I don't know how to efficiently profile performance issues with complex LINQ queries, and I'm open to comment.

The question ultimately arises: is there an alternative to this LINQ query that will produce the same set of results without such a serious performance hit?

Thanks in advance.

+4
source share
2 answers

Assuming the dataset is small enough, I would just drag in all the months, go to ToList() , and then filter out only the last month in memory. LINQ can do some really weird things when the query gets complicated.

+1
source

In the second query, you added:

Balance = i.GroupBy(dc => dc.DataContributorId).Sum(x => x.Select(z => z.Balance).FirstOrDefault()),

and

.OrderByDescending(o => o.ExperienceMonth)

Try removing groupBy and orderBy to see if they are caused by one of the performance issues, in this case check (and try) to add an index to this column (if one is missing).

Also check SQL Profiler (if SQL Server 2005 or lower) or SQL Extended Events (if SQL Server 2008 or higher).

You can try the LinqPAD tools to check the SQL DML generated by the query

There are other methods for retrieving data from a database:

+1
source

All Articles