Linq performance: should I use `where` or` select` first

I have a large List in memory, from a class that has about 20 properties .

I would like to filter this list on only one property , for a specific task I need only a list of this property . So my query looks something like this:

 data.Select(x => x.field).Where(x => x == "desired value").ToList() 

Which one gives me the best performance, first using Select or using Where ?

 data.Where(x => x.field == "desired value").Select(x => x.field).ToList() 

Please let me know if it is related to data type I store data in memory or field type. Please note that I also need these objects for other tasks, so I can not filter them in the first place and before loading them into memory.

+7
performance c # linq linq-to-entities query-performance
source share
2 answers

Which one gives me the best performance, first select "Select" or "Where."

Where first approach is more efficient, since it first filters your collection and then only selects the filtered ones .

Mathematically speaking, the Where -first approach accepts N + N' operations, where N' is the number of elements in the collection that fall under your Where condition.
Thus, it accepts the minimum operations N + 0 = N (if none of the elements transfers this condition Where ) and N + N = 2 * N operations maximum (if all elements transmit the condition).

At the same time, the first Select approach will always perform exactly 2 * N operations, since it iterates through all the objects to get the property, and then iterates over all the objects to filter them.

Proof test

I finished the test to confirm my answer.

Results:

 Condition value: 50 Where -> Select: 88 ms, 10500319 hits Select -> Where: 137 ms, 20000000 hits Condition value: 500 Where -> Select: 187 ms, 14999212 hits Select -> Where: 238 ms, 20000000 hits Condition value: 950 Where -> Select: 186 ms, 19500126 hits Select -> Where: 402 ms, 20000000 hits 

If you run the test many times, you will see that the approximations to the Where -> Select approach change from time to time, and the Select -> Where approach always performs 2N operations.

IDEOne demo:

https://ideone.com/jwZJLt

the code:

 class Point { public int X { get; set; } public int Y { get; set; } } class Program { static void Main() { var random = new Random(); List<Point> points = Enumerable.Range(0, 10000000).Select(x => new Point { X = random.Next(1000), Y = random.Next(1000) }).ToList(); int conditionValue = 250; Console.WriteLine($"Condition value: {conditionValue}"); Stopwatch sw = new Stopwatch(); sw.Start(); int hitCount1 = 0; var points1 = points.Where(x => { hitCount1++; return xX < conditionValue; }).Select(x => { hitCount1++; return xY; }).ToArray(); sw.Stop(); Console.WriteLine($"Where -> Select: {sw.ElapsedMilliseconds} ms, {hitCount1} hits"); sw.Restart(); int hitCount2 = 0; var points2 = points.Select(x => { hitCount2++; return xY; }).Where(x => { hitCount2++; return x < conditionValue; }).ToArray(); sw.Stop(); Console.WriteLine($"Select -> Where: {sw.ElapsedMilliseconds} ms, {hitCount2} hits"); Console.ReadLine(); } } 

Related Questions

These questions may also be of interest to you. They are not related to Select and Where , but they relate to LINQ order performance:

Does LINQ function order mean? The order of LINQ extension methods does not affect performance?

+7
source share

The answer will depend on the state of your collection.

  • If most objects pass the Where test, first select Select;
  • If fewer objects pass the Where test, apply where first.

Update:

@ Yeldar Kurmangaliev wrote an answer with a concrete example and benchmarking. I executed a similar code to verify its statement , and our results are exactly the opposite , and this is due to the fact that I did the same test as his, but with an object that is not as simple as the Point type that it used to run tests.

The code is very similar to its code, except that I changed the class name from Point to EnumerableClass .

Given below are the classes I used for the EnumerableClass class:

 public class EnumerableClass { public int X { get; set; } public int Y { get; set; } public String A { get; set; } public String B { get; set; } public String C { get; set; } public String D { get; set; } public String E { get; set; } public Frame F { get; set; } public Gatorade Gatorade { get; set; } public Home Home { get; set; } } public class Home { private Home(int rooms, double bathrooms, Stove stove, InternetConnection internetConnection) { Rooms = rooms; Bathrooms = (decimal) bathrooms; StoveType = stove; Internet = internetConnection; } public int Rooms { get; set; } public decimal Bathrooms { get; set; } public Stove StoveType { get; set; } public InternetConnection Internet { get; set; } public static Home GetUnitOfHome() { return new Home(5, 2.5, Stove.Gas, InternetConnection.Att); } } public enum InternetConnection { Comcast = 0, Verizon = 1, Att = 2, Google = 3 } public enum Stove { Gas = 0, Electric = 1, Induction = 2 } public class Gatorade { private Gatorade(int volume, Color liquidColor, int bottleSize) { Volume = volume; LiquidColor = liquidColor; BottleSize = bottleSize; } public int Volume { get; set; } public Color LiquidColor { get; set; } public int BottleSize { get; set; } public static Gatorade GetGatoradeBottle() { return new Gatorade(100, Color.Orange, 150); } } public class Frame { public int X { get; set; } public int Y { get; set; } private Frame(int x, int y) { X = x; Y = y; } public static Frame GetFrame() { return new Frame(5, 10); } } 

The Frame , Gatorade and Home classes have a static method, each of which returns an instance of its type.

The following is the main program:

 public static class Program { const string Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; private static readonly Random Random = new Random(); private static string RandomString(int length) { return new string(Enumerable.Repeat(Chars, length) .Select(s => s[Random.Next(s.Length)]).ToArray()); } private static void Main() { var random = new Random(); var largeCollection = Enumerable.Range(0, 1000000) .Select( x => new EnumerableClass { A = RandomString(500), B = RandomString(1000), C = RandomString(100), D = RandomString(256), E = RandomString(1024), F = Frame.GetFrame(), Gatorade = Gatorade.GetGatoradeBottle(), Home = Home.GetUnitOfHome(), X = random.Next(1000), Y = random.Next(1000) }) .ToList(); const int conditionValue = 250; Console.WriteLine(@"Condition value: {0}", conditionValue); var sw = new Stopwatch(); sw.Start(); var firstWhere = largeCollection .Where(x => xY < conditionValue) .Select(x => xY) .ToArray(); sw.Stop(); Console.WriteLine(@"Where -> Select: {0} ms", sw.ElapsedMilliseconds); sw.Restart(); var firstSelect = largeCollection .Select(x => xY) .Where(y => y < conditionValue) .ToArray(); sw.Stop(); Console.WriteLine(@"Select -> Where: {0} ms", sw.ElapsedMilliseconds); Console.ReadLine(); Console.WriteLine(); Console.WriteLine(@"First Where first item: {0}", firstWhere.FirstOrDefault()); Console.WriteLine(@"First Select first item: {0}", firstSelect.FirstOrDefault()); Console.WriteLine(); Console.ReadLine(); } } 

Results:

I ran tests several times and found that

.Select (). Where () performs better than .Where (). Select ().

when the collection size is 1,000,000.


Here is the first test result when I forcibly set each EnumerableClass object Y to 5, so each element is passed where:

 Condition value: 250 Where -> Select: 149 ms Select -> Where: 115 ms First Where first item: 5 First Select first item: 5 

Here is the second test result, in which I forcibly assigned each EnumerableClass object Y value to 251, so no element was passed where:

 Condition value: 250 Where -> Select: 110 ms Select -> Where: 100 ms First Where first item: 0 First Select first item: 0 

It is clear that the result is so dependent on the state of the collection that :

  • In tests @ YeldarKurmangaliyev.Where (). Select () performs better; and,
  • In my tests .Select (). Where () performs better.

The state of the collection, which I talk about many times, includes:

  • the size of each item;
  • total number of items in the collection; and,
  • The number of elements that the Where clause can pass.

Response to commentary on the answer:

In addition, @Enigmativity said that knowing in advance the result of Where, to find out whether to put Where First or Select First - Catch-22. Ideally and theoretically, he is right and it is not surprising that this situation is observed in another computer science domain - Scheduling .

The best planning algorithm is Shortest Job First , where we plan this task first, which will be completed in the shortest possible time. But how would anyone know how long a particular job would take? Well, the answer is:

The following short task is used in specialized environments where accurate estimates of lead times are available.

Therefore, as I said right at the top (it was also the first, shorter version of my answer), the correct answer to this question will depend on the current state of the collection .

In general

  • if your objects are in a reasonable range of sizes; and,
  • You select a very small piece from each object; and,
  • the size of your collection is also not only in thousands,

then the guide mentioned at the top of this answer will be helpful to you.

+3
source share

All Articles