Linq to Objects: filtering efficiency issue

Question

Linq to Objects: filtering efficiency issue

I was thinking about how linq is computed, and it made me think:

If i write

var count = collection.Count(o => o.Category == 3);

Will this be done otherwise than:

 var count = collection.Where(o => o.Category == 3).Count();

In the end, IEnumerable<T>.Where() will return an IEnumerable<T> that does not implement the Count property, so the subsequent Count() really need to go through the elements to determine the count that should cause the extra time spent on it.

I wrote some quick test codes to get some metrics, but they seem to beat each other at random. At first I will not enter the test code, but if someone asks, I will pick it up.

So, am I missing something?

+4

performance linq linq-to-objects

danijels Oct 13 '10 at 9:14

source share

4 answers

It will not be much, in fact - both forms will iterate over the collection, check the predicate for each element and count the matches. Both approaches will transmit data - this is not like where Where actually creates a list of all matches in memory, for example.

In the first form there is one (thin) layer of indirection in which everything is. The main reason for using it (IMO) is readability / simplicity, not performance.

+3

Jon skeet Oct 13 '10 at 9:16

source share

I know what you are thinking about. At least I think I know; Count() will look if Count is available as a property and will simply return it if it is. Otherwise, it must list the elements to get the return value.

However, the version of Count() that accepts the predicate always causes the collection to iterate over again, since it must do this to see which ones match.

0

Andrew Barber Oct 13 '10 at 9:38

source share

The above answers find good points, also note that if you separate from any Linq-To-X implementations, deferred execution (Linq to Sql is the main one), expression parameters used in these methods can produce different results.

0

Chris webb Oct 13 '10 at 9:48

source share

Ani · Accepted Answer · 2010-10-13T09:28:20+0000

As John Skeet says, both methods will essentially have to do the same thing - enumerate the sequence, conditionally increasing the counter when the predicate coincides. Any differences in performance between them should be small: minor for almost all use cases. If there is a marker winner, I think that it should be the first, since from the reflector it seems that the Count overload, which uses the predicate, uses its own foreach to list, and not for the more obvious way of unloading, working with a Where stream without Count parameters, as in your second example. This means that Technique No. 1 probably has two minor advantages:

Checks fewer argument checks (null tests, etc.). Technique No. 2 Count also checks if its (tubular) input is ICollection or ICollection<T> , which it cannot be.
One configured enumerator against two counters connected together (an additional state machine has costs).

There is one minor argument in favor of method # 2, though: Where little harder to build an enumerator for the source sequence; it uses another for lists and arrays. This can make it more effective in certain scenarios.

Of course, I have to repeat that I could be wrong in my analysis - reasoning about performance using static code analysis, especially when the differences are likely to be insignificant, is not a good idea. There is only one way to find out - measuring runtime for your particular setup.

FYI, the source I thought of was from .NET 3.5 SP1.

Linq to Objects: filtering efficiency issue

More articles: