What is the bottleneck in the Linq.Max implementation?

Question

What is the bottleneck in the Linq.Max implementation?

Preamble: I changed the code a bit (manual max. Search in an array) to some Linq.Max() super-sexy written string that leads me to performance questions (I often deal with large arrays). Therefore, I made a small program for testing, because I trust only what I see and get the following results:

 The size is now of 1 elements With the loop it took: 00:00:00.0000015 With Linq it took: 00:00:00.0000288 The loop is faster: 94,79% ----------------------------------------- The size is now of 10 elements With the loop it took: 00:00:00 With Linq it took: 00:00:00.0000007 The loop is faster: 100,00% ----------------------------------------- The size is now of 100 elements With the loop it took: 00:00:00 With Linq it took: 00:00:00.0000011 The loop is faster: 100,00% ----------------------------------------- The size is now of 1 000 elements With the loop it took: 00:00:00.0000003 With Linq it took: 00:00:00.0000078 The loop is faster: 96,15% ----------------------------------------- The size is now of 10 000 elements With the loop it took: 00:00:00.0000063 With Linq it took: 00:00:00.0000765 The loop is faster: 91,76% ----------------------------------------- The size is now of 100 000 elements With the loop it took: 00:00:00.0000714 With Linq it took: 00:00:00.0007602 The loop is faster: 90,61% ----------------------------------------- The size is now of 1 000 000 elements With the loop it took: 00:00:00.0007669 With Linq it took: 00:00:00.0081737 The loop is faster: 90,62% ----------------------------------------- The size is now of 10 000 000 elements With the loop it took: 00:00:00.0070811 With Linq it took: 00:00:00.0754348 The loop is faster: 90,61% ----------------------------------------- The size is now of 100 000 000 elements With the loop it took: 00:00:00.0788133 With Linq it took: 00:00:00.7758791 The loop is faster: 89,84%

In short, Linq is almost 10 times slower, which bothered me a bit, so I looked at the implementation of Max()

 public static int Max(this IEnumerable<int> source) { if (source == null) throw Error.ArgumentNull("source"); int value = 0; bool hasValue = false; foreach (int x in source) { if (hasValue) { if (x > value) value = x; } else { value = x; hasValue = true; } } if (hasValue) return value; throw Error.NoElements(); }

As the title says, what in this implementation makes it 10 times slower? (And it's not about ForEach , I checked)

Edit:

Of course I'm testing in Release mode.

Here is my test code (no output):

 //---------------- private int[] arrDoubles; //---------------- Stopwatch watch = new Stopwatch(); //Stop a 100 Millions to avoid memory overflow on my laptop for (int i = 1; i <= 100000000; i = i * 10) { fillArray(i); watch.Restart(); int max = Int32.MinValue; // Reset for (int j = 0; j < arrDoubles.Length; j++) { max = Math.Max(arrDoubles[j], max); } watch.Stop(); TimeSpan loopSpan = watch.Elapsed; watch.Restart(); max = Int32.MinValue; // Reset max = arrDoubles.Max(); watch.Stop(); TimeSpan linqSpan = watch.Elapsed; } //------------------------------------------- private void fillArray(int nbValues) { int Min = Int32.MinValue; int Max = Int32.MaxValue; Random randNum = new Random(); arrDoubles = Enumerable.Repeat(0, nbValues).Select(i => randNum.Next(Min, Max)).ToArray(); }

+7

performance c # linq

Thomas Ayoub Jul 21 '15 at 13:26

source share

2 answers

Comparing two integers quickly on any scale (just a few cycles). However, enumeration takes longer, so you can say that LINQ spends more time on enumeration than comparison - although your code has direct access to any number in the array.

+2

shay__ Jul 21 '15 at 13:51

source share

Matthew watson · Accepted Answer · 2015-07-21T13:46:10+0000

This is probably because access to the array through IEnumerable<> much slower than accessing it from the actual array type (even when using foreach ).

The following code demonstrates this. Note that the code in max1() and max2() is identical; the only difference is the type of the array parameter. Both methods are passed by the same object during testing.

Try to run it from the RELEASE assembly (and not under the debugger, which will include debugging code even for the release build):

 using System; using System.Collections.Generic; using System.Diagnostics; namespace Demo { public class Program { private static void Main(string[] args) { var array = new int[100000000]; var sw = new Stopwatch(); for (int trial = 0; trial < 8; ++trial) { sw.Restart(); for (int i = 0; i < 10; ++i) max1(array); var elapsed1 = sw.Elapsed; Console.WriteLine("int[] took " + elapsed1); sw.Restart(); for (int i = 0; i < 10; ++i) max2(array); var elapsed2 = sw.Elapsed; Console.WriteLine("IEnumerable<int> took " + elapsed2); Console.WriteLine("\nFirst method was {0} times faster.\n", elapsed2.TotalSeconds / elapsed1.TotalSeconds); } } private static int max1(int[] array) { int result = int.MinValue; foreach (int n in array) if (n > result) result = n; return result; } private static int max2(IEnumerable<int> array) { int result = int.MinValue; foreach (int n in array) if (n > result) result = n; return result; } } }

On my PC, the int[] version is about 10 times faster than the IEnumerable<int> version.

What is the bottleneck in the Linq.Max implementation?

Edit:

More articles: