Can someone explain why the third query is several orders of magnitude slower than the others when it should not take longer than doing the first two in the sequence?
var data = Enumerable.Range(0, 10000).Select(x => new { Index = x, Value = x + " is the magic number"}).ToList(); var test1 = data.Select(x => new { Original = x, Match = data.Single(y => y.Value == x.Value) }).Take(1).Dump(); var test2 = data.Select(x => new { Original = x, Match = data.Single(z => z.Index == x.Index) }).Take(1).Dump(); var test3 = data.Select(x => new { Original = x, Match = data.Single(z => z.Index == data.Single(y => y.Value == x.Value).Index) }).Take(1).Dump();
EDIT: I added .ToList () to the original data generation because I do not want to re-generate the data to cloud the problem.
I'm just trying to understand why this code is so slow, by the way, not looking for a faster alternative, unless it sheds light on this question. I would think that if Linq is lazily evaluated and I am only looking for the first element (Take (1)), then test3's:
data.Select(x => new { Original = x, Match = data.Single(z => z.Index == data.Single(y => y.Value == x.Value).Index) }).Take(1);
may decrease to:
data.Select(x => new { Original = x, Match = data.Single(z => z.Index == 1) }).Take(1)
to O (N), since the first data item is successfully matched after one full scan of the data using the internal Single (), leaving another scan of the data remaining with Single (). So, all is O (N).
Obviously, it is being processed in a longer way, but I really don't understand how and why.
Test3 takes a couple of seconds to work between each other, so I think we can safely assume that if your answer contains the number 10 ^ 16, you made a mistake somewhere along the line.