"); ...">

Are LINQ cache values ​​calculated?

Suppose I have the following code:

var X = XElement.Parse (@" <ROOT> <MUL v='2' /> <MUL v='3' /> </ROOT> "); Enumerable.Range (1, 100) .Select (s => X.Elements () .Select (t => Int32.Parse (t.Attribute ("v").Value)) .Aggregate (s, (t, u) => t * u) ) .ToList () .ForEach (s => Console.WriteLine (s)); 

What is the .NET runtime environment that works here? Is it parsing and converting attributes to integers from 100 times, or is it smart enough to realize that it should cache the parsed values ​​and not repeat the calculations for each element in the range?

Moreover, how would I myself figure this out?

Thanks in advance for your help.

+7
source share
2 answers

It has been a while since I broke through this code, but how IIRC works, the Select path is to just cache the Func that you supply it and run it one by one from the original collection. Thus, for each element in the outer range, it will execute the inner Select/Aggregate sequence as if it were the first time. There is no built-in caching - you will have to implement this in expressions.

If you wanted to find out on your own, you have three main options:

  • Compile the code and use ildasm to view the IL; this is most accurate, but especially with lambdas and closures, what you get from IL may not look like what you add to the C # compiler.
  • Use something like dotPeek to decompile System.Linq.dll in C #; again, what you get from these kinds of tools may look a bit like the source code, but at least it will be C # (and dotPeek in particular does a pretty good job and is free.)
  • My personal preferences - download the .NET 4.0 Help source and search for yourself; what it is for :) You just have to trust MS that the source source corresponds to the actual source used to create the binaries, but I see no good reason to doubt them.
  • As pointed out by @AllonGuralnek, you can set breakpoints on specific lambda expressions within the same line; place the cursor somewhere inside the body of the lambda and press F9 and it will stop only on the lambda. (If you do it wrong, it will highlight the entire line in the color of the breakpoint, if you do it right, it will just highlight lambda.)
+2
source

LINQ and IEnumerable<T> are pull based. This means that the predicates and actions that are part of the LINQ statement in general are not executed until the values ​​are pulled. In addition, predicates and actions will be executed every time the values ​​are pulled (for example, there is no secret caching).

Extraction from IEnumerable<T> is done by the foreach , which is really the syntactic sugar to get an enumerator by calling IEnumerable<T>.GetEnumerator() and repeatedly calling IEnumerator<T>.MoveNext() to retrieve the values.

LINQ statements such as ToList() , ToArray() , ToDictionary() and ToLookup() wrap the foreach so that these methods pull. The same can be said about operators such as Aggregate() , Count() and First() . These methods are generic in that they create one result that must be created by executing the foreach .

Many LINQ statements create a new IEnumerable<T> sequence. When an element is pulled from the resulting sequence, the operator pulls one or more elements from the original sequence. The Select() operator is the most obvious example, but other examples are: SelectMany() , Where() , Concat() , Union() , Distinct() , Skip() and Take() . These statements do not perform caching. When the Nth element is then extruded from Select() , it extends the Nth element from the original sequence, applies the projection using the provided action, and returns it. Nothing secret happens here.

Other LINQ statements also create new IEnumerable<T> sequences, but they are implemented by actually pulling the entire original sequence, doing their work, and then creating a new sequence. These methods include Reverse() , OrderBy() and GroupBy() . However, operator pulling is only performed when the operator is popped, which means that you still need the foreach "at the end" of the LINQ statement before anything is done. You can argue that these statements use the cache because they immediately retrieve the entire source sequence. However, this cache is created every time the statement is iterated, so this is really an implementation detail, not something that will magically detect that you apply the same OrderBy() operation several times to the same sequence.


In your example, ToList() will pull. The action in the external Select will be executed 100 times. Each time this action is performed, Aggregate() performs another click that will parse the XML attributes. In total, the code will be called Int32.Parse() 200 times.

You can improve this by pulling the attributes once instead of each iteration:

 var X = XElement.Parse (@" <ROOT> <MUL v='2' /> <MUL v='3' /> </ROOT> ") .Elements () .Select (t => Int32.Parse (t.Attribute ("v").Value)) .ToList (); Enumerable.Range (1, 100) .Select (s => x.Aggregate (s, (t, u) => t * u)) .ToList () .ForEach (s => Console.WriteLine (s)); 

Now Int32.Parse() is called only 2 times. However, the cost is that a list of attribute values ​​needs to be allocated, saved, and ultimately garbage collected. (Not a big problem when the list contains two items.)

Please note that if you forget the first ToList() that pulls out the attributes, the code will still work, but with the same performance characteristics as the source code. A space is not used to store attributes, but they are processed at each iteration.

+4
source

All Articles