Is Linq execution order the cause of this catch?

I have this function to repeat the sequence:

public static List<T> Repeat<T>(this IEnumerable<T> lst, int count) { if (count < 0) throw new ArgumentOutOfRangeException("count"); var ret = Enumerable.Empty<T>(); for (var i = 0; i < count; i++) ret = ret.Concat(lst); return ret.ToList(); } 

Now if I do:

 var d = Enumerable.Range(1, 100); var f = d.Select(t => new Person()).Repeat(10); int i = f.Distinct().Count(); 

I expect i be 100, but that will give me 1000! The question is, why is this happening? Shouldn't Linq be smart enough to realize that he was the first to choose the 100 people I need to combine with the variable ret ? I get the feeling that here Concat given preference when it is used with Select , when it is executed in ret.ToList() ..

Edit:

If I do this, I get the correct result as expected:

 var f = d.Select(t => new Person()).ToList().Repeat(10); int i = f.Distinct().Count(); //prints 100 

Change again:

I did not redefine Equals . I'm just trying to get 100 unique people (by reference, of course). My question is: can someone explain to me why Linq does not perform the select operation first and then concatenation (of course, at runtime)?

+1
c # linq
source share
3 answers

The problem is that if you do not call ToList , d.Select(t => new Person()) rediscounted every time Repeat goes through the list, duplicating Person s. This method is known as deferred execution .

In general, LINQ does not assume that every time it enumerates a sequence, it gets the same sequence or even a sequence with the same length. If this effect is undesirable, you can always β€œmaterialize” the sequence inside your Repeat method by calling ToList right away, for example:

 public static List<T> Repeat<T>(this IEnumerable<T> lstEnum, int count) { if (count < 0) throw new ArgumentOutOfRangeException("count"); var lst = lstEnum.ToList(); // Enumerate only once var ret = Enumerable.Empty<T>(); for (var i = 0; i < count; i++) ret = ret.Concat(lst); return ret.ToList(); } 
+4
source share

I could break my problem into something less trivial:

 var d = Enumerable.Range(1, 100); var f = d.Select(t => new Person()); 

Now, essentially, I am doing this:

 f = f.Concat(f); 

Please note that the request has not been completed so far. At runtime, f is still d.Select(t => new Person()) not executed . Thus, the last statement at runtime can be broken down into:

 f = f.Concat(f); //which is f = d.Select(t => new Person()).Concat(d.Select(t => new Person())); 

which is obvious for creating 100 + 100 = 200 new instances of people. So,

 f.Distinct().ToList(); //yields 200, not 100 

what is the right behavior.

Edit: I could rewrite the extension method as simple as

 public static IEnumerable<T> Repeat<T>(this IEnumerable<T> source, int times) { source = source.ToArray(); return Enumerable.Range(0, times).SelectMany(_ => source); } 

I used the dasblinkenlight suggestion to fix this problem.

+1
source share

Each Person object is a separate object. All 1000 are different.

What is the definition of equality for type Person ? If you do not cancel it, this definition will be a reference equality, which means that all 1000 objects are different.

0
source share

All Articles