Optimize / rewrite LINQ query with GROUP BY and COUNT

I am trying to get the number of unique backgrounds and bars grouped by name in the following dataset.

Id | IsActive | Name | Foo | Bar 1 | 1 | A | 11 | null 2 | 1 | A | 11 | null 3 | 1 | A | null | 123 4 | 1 | B | null | 321 

I expect the result of the above data to be:

 Expected: A = 2; B = 1; 

I tried to group by name, Foo, Bar, and then group again by name with a score to get a row counter. But that did not give me the correct result. (or ToDictionary threw a duplicate key, I played a lot with this, so I can’t remember)

 db.MyEntity .Where(x => x.IsActive) .GroupBy(x => new { x.Name, x.Foo, x.Bar }) .GroupBy(x => new { x.Key.Name, Count = x.Count() }) .ToDictionary(x => x.Key, x => x.Count); 

So, I came up with this LINQ query. But it is rather slow.

 db.MyEntity .Where(x => x.IsActive) .GroupBy(x => x.Name) .ToDictionary(x => x.Key, x => x.Where(y => y.Foo != null).Select(y => y.Foo).Distinct().Count() + x.Where(y => y.Bar != null).Select(y => y.Bar).Distinct().Count()); 

How can I optimize it?

Here is the object for reference

 public class MyEntity { public int Id { get; set; } public bool IsActive { get; set; } public string Name { get; set; } public int? Foo { get; set; } public int? Bar { get; set; } } 

Edit

I also tried this request

 db.MyEntity .Where(x => x.IsActive) .GroupBy(x => new { x.Name, x.Foo, x.Bar }) .GroupBy(x => x.Key.Name) .ToDictionary(x => x.Key, x => x.Count()); 

But this caused a timeout exception :(

+5
source share
4 answers

The query is extremely inefficient because you do most of the work (everything related to building the dictionary) on the client side, not being able to use the database to carry out your forecasts. This is a problem both because the database (especially if these values ​​are indexed) can do this faster than the client, and also because the forecasts in the database are associated with a much smaller amount of data sent over the network.

So just make your predictions before grouping the data.

 var activeItems = db.MyEntity.Where(x => x.IsActive); var query = activeItems.Select(x => new { Name, Value = x.Foo}).Distinct() .Concat(activeItems.Select(x => new { Name, Value = x.Bar}).Distinct()) .Where(x => x != null) .GroupBy(pair => pair.Name) .Select(group => new { group.Key, Count = Group.Count()}) .ToDictionary(pair => pair.Key, pair => pair.Count); 
+4
source

Your task is to create the following query:

 select Name, count(distinct Foo) + count(distinct Bar) from myEntity where IsActive = 1 group by Name 

This is the minimum request to get what you want. But LINQ seems to exaggerate everything as much as possible :)

Your goal is to do as much as possible at the database level. Now your request has been translated into:

 SELECT [Project2].[C1] AS [C1], [Project2].[Name] AS [Name], [Project2].[C2] AS [C2], [Project2].[id] AS [id], [Project2].[IsActive] AS [IsActive], [Project2].[Name1] AS [Name1], [Project2].[Foo] AS [Foo], [Project2].[Bar] AS [Bar] FROM ( SELECT [Distinct1].[Name] AS [Name], 1 AS [C1], [Extent2].[id] AS [id], [Extent2].[IsActive] AS [IsActive], [Extent2].[Name] AS [Name1], [Extent2].[Foo] AS [Foo], [Extent2].[Bar] AS [Bar], CASE WHEN ([Extent2].[id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2] FROM (SELECT DISTINCT [Extent1].[Name] AS [Name] FROM [dbo].[SomeTable] AS [Extent1] WHERE [Extent1].[IsActive] = 1 ) AS [Distinct1] LEFT OUTER JOIN [dbo].[SomeTable] AS [Extent2] ON ([Extent2].[IsActive] = 1) AND ([Distinct1].[Name] = [Extent2].[Name]) ) AS [Project2] ORDER BY [Project2].[Name] ASC, [Project2].[C2] ASC 

It selects everything from the database and performs grouping at the application level, which is inefficient.

@Servy request:

 var activeItems = db.MyEntity.Where(x => x.IsActive); var query = activeItems.Select(x => new { Name, Value = x.Foo}).Distinct() .Concat(activeItems.Select(x => new { Name, Value = x.Bar}).Distinct()) .Where(x => x != null) .GroupBy(pair => pair.Name) .Select(group => new { group.Key, Count = Group.Count()}) .ToDictionary(pair => pair.Key, pair => pair.Count); 

translates to:

 SELECT 1 AS [C1], [GroupBy1].[K1] AS [C2], [GroupBy1].[A1] AS [C3] FROM ( SELECT [UnionAll1].[Name] AS [K1], COUNT(1) AS [A1] FROM (SELECT [Distinct1].[Name] AS [Name] FROM ( SELECT DISTINCT [Extent1].[Name] AS [Name], [Extent1].[Foo] AS [Foo] FROM [dbo].[SomeTable] AS [Extent1] WHERE ([Extent1].[IsActive] = 1) AND ([Extent1].[Foo] IS NOT NULL) ) AS [Distinct1] UNION ALL SELECT [Distinct2].[Name] AS [Name] FROM ( SELECT DISTINCT [Extent2].[Name] AS [Name], [Extent2].[Bar] AS [Bar] FROM [dbo].[SomeTable] AS [Extent2] WHERE ([Extent2].[IsActive] = 1) AND ([Extent2].[Bar] IS NOT NULL) ) AS [Distinct2]) AS [UnionAll1] GROUP BY [UnionAll1].[Name] ) AS [GroupBy1] 

This is much better.

I tried the following:

 var activeItems = (from o in db.SomeTables where o.IsActive group o by o.Name into gr select new { gr.Key, cc = gr.Select(c => c.Foo).Distinct().Count(c => c != null) + gr.Select(c => c.Bar).Distinct().Count(c => c != null) }).ToDictionary(c => c.Key); 

This is translated into:

 SELECT 1 AS [C1], [Project5].[Name] AS [Name], [Project5].[C1] + [Project5].[C2] AS [C2] FROM ( SELECT [Project3].[Name] AS [Name], [Project3].[C1] AS [C1], (SELECT COUNT(1) AS [A1] FROM ( SELECT DISTINCT [Extent3].[Bar] AS [Bar] FROM [dbo].[SomeTable] AS [Extent3] WHERE ([Extent3].[IsActive] = 1) AND ([Project3].[Name] = [Extent3].[Name]) AND ([Extent3].[Bar] IS NOT NULL) ) AS [Distinct3]) AS [C2] FROM ( SELECT [Distinct1].[Name] AS [Name], (SELECT COUNT(1) AS [A1] FROM ( SELECT DISTINCT [Extent2].[Foo] AS [Foo] FROM [dbo].[SomeTable] AS [Extent2] WHERE ([Extent2].[IsActive] = 1) AND ([Distinct1].[Name] = [Extent2].[Name]) AND ([Extent2].[Foo] IS NOT NULL) ) AS [Distinct2]) AS [C1] FROM ( SELECT DISTINCT [Extent1].[Name] AS [Name] FROM [dbo].[SomeTable] AS [Extent1] WHERE [Extent1].[IsActive] = 1 ) AS [Distinct1] ) AS [Project3] ) AS [Project5] 

The same thing, but without alliances, as in the second version.

Conclusion:

I would create a view and import it into the model if the table is quite large and performance is critical. Otherwise, stick with version 3 or version 2 of @Servy. Performance should be checked, of course.

+1
source

I think you can change your initial request a bit to get what you want:

 db.MyEntity .Where(x => x.IsActive) .GroupBy(x => new { x.Name, x.Foo, x.Bar }) .GroupBy(x => x.Key.Name) .ToDictionary(x => x.Key, x => x.Count()); 

When you add Count() to the second group, you count the duplicate values ​​for the three-part key. You only want to count different values ​​for each three-part key, so you calculate after grouping on Name .

0
source

Only advice on the subject may not use DISTINCT for best performance. Use grouping.

Please look at the link

-2
source

All Articles