Why is this LINQ so slow?

Can someone explain why the third query is several orders of magnitude slower than the others when it should not take longer than doing the first two in the sequence?

var data = Enumerable.Range(0, 10000).Select(x => new { Index = x, Value = x + " is the magic number"}).ToList(); var test1 = data.Select(x => new { Original = x, Match = data.Single(y => y.Value == x.Value) }).Take(1).Dump(); var test2 = data.Select(x => new { Original = x, Match = data.Single(z => z.Index == x.Index) }).Take(1).Dump(); var test3 = data.Select(x => new { Original = x, Match = data.Single(z => z.Index == data.Single(y => y.Value == x.Value).Index) }).Take(1).Dump(); 

EDIT: I added .ToList () to the original data generation because I do not want to re-generate the data to cloud the problem.

I'm just trying to understand why this code is so slow, by the way, not looking for a faster alternative, unless it sheds light on this question. I would think that if Linq is lazily evaluated and I am only looking for the first element (Take (1)), then test3's:

 data.Select(x => new { Original = x, Match = data.Single(z => z.Index == data.Single(y => y.Value == x.Value).Index) }).Take(1); 

may decrease to:

 data.Select(x => new { Original = x, Match = data.Single(z => z.Index == 1) }).Take(1) 

to O (N), since the first data item is successfully matched after one full scan of the data using the internal Single (), leaving another scan of the data remaining with Single (). So, all is O (N).

Obviously, it is being processed in a longer way, but I really don't understand how and why.

Test3 takes a couple of seconds to work between each other, so I think we can safely assume that if your answer contains the number 10 ^ 16, you made a mistake somewhere along the line.

+4
source share
2 answers

The first two "tests" are identical and both are slow. The third adds another level of slowness.

The first two LINQ operators here are quadratic in nature. Since your β€œMatch” element potentially requires iterating over the entire β€œdata” sequence to find a match, as you move through the range, the time period for this element will increase longer. For example, the 10,000th element will cause the engine to go through all 10,000 elements of the original sequence to find a match, which makes the operation O (N ^ 2).

Operation "test3" takes this to a whole new level of pain, as it "squares" the operation O (N ^ 2) in the second single - forcing it to perform another quadratic operation on top of the first - which will be a huge amount of operations.

Each time you do data.Single (...) with a match, you perform O (N ^ 2) operation - the third test basically becomes O (N ^ 4), which will be orders of magnitude slower.

+7
source

Fixed.

 var data = Enumerable.Range(0, 10000) .Select(x => new { Index = x, Value = x + " is the magic number"}) .ToList(); var forward = data.ToLookup(x => x.Index); var backward = data.ToLookup(x => x.Value); var test1 = data.Select(x => new { Original = x, Match = backward[x.Value].Single() } ).Take(1).Dump(); var test2 = data.Select(x => new { Original = x, Match = forward[x.Index].Single() } ).Take(1).Dump(); var test3 = data.Select(x => new { Original = x, Match = forward[backward[x.Value].Single().Index].Single() } ).Take(1).Dump(); 

In source code

  • data.ToList () generates 10,000 instances (10 ^ 4).
  • data.Select (data.Single ()). ToList () generates 100,000,000 instances (10 ^ 8).
  • data.Select (data.Single (data.Single ())). ToList () generates 100,000,000,000,000,000 instances (10 ^ 16).

Single and first different. Single throws, if there are several instances. Single must fully list its source to check for multiple instances.

+1
source

Source: https://habr.com/ru/post/1314452/


All Articles