Parallel Linq Query Optimization

For some time, I structured my code around methods with no side effects to use parallel linq to speed things up. Along the way, I came across a lazy assessment that makes something worse and not better, and I would like to know if there are any tools for optimizing parallel linq queries.

I ask because recently I reorganized some awkwardly parallel code, changing some methods and piling AsParallel in certain key places. The operating time was reduced from 2 minutes to 45 seconds, but it became clear from the performance monitor that in some places all the cores on the processor were not fully used. After several false starts, I made some of the requests execute with ToArray , and the execution time was further reduced to 16 seconds. It was nice to cut down on code execution time, but it was also a bit confusing because it was unclear where in code requests it was necessary to force using ToArray . Waiting until the last minute of the query execution was not an optimal strategy, but it was not clear at what points in the code some of the subqueries needed to be forced to use all CPU cores.

As I already donโ€™t know, how to pepper ToArray or other methods that force linq calculations to get the maximum CPU load. So, are there any general guidelines and tools for optimizing parallel linq queries?

Here is an example of pseudo code:

 var firstQuery = someDictionary.SelectMany(FirstTransformation); var secondQuery = firstQuery.Select(SecondTransformation); var thirdQuery = secondQuery.Select(ThirdTransformation).Where(SomeConditionCheck); var finalQuery = thirdQuery.Select(FinalTransformation).Where(x => x != null); 

FirstTransformation , SecondTransformation , ThirdTransformation all related to the CPU and, in terms of complexity, they are several 3x3 matrix multiplications and some if branches. SomeConditionCheck is almost a null check. FinalTransformation is most of the code that requires most of the code because it will perform a whole bunch of intersections of linear planes and will check the localization of the polygon for these intersections, and then extract the intersection that is closest to a specific point on the line.

I have no idea why the places where I put AsParallel reduced the code execution time as much as I did. Now I have reached a local minimum in terms of runtime, but I have no idea why. I was just unlucky that I came across him. If you're curious, where to put AsParallel is the first and last line. Entering AsParallel to another location will only increase the execution time, sometimes by 20 seconds. The first line hides the hidden ToArray .

+13
c # parallel-processing linq plinq
Feb 04 2018-12-12T00:
source share
2 answers

A couple of things happen here:

  • PLINQ more efficiently parallelizes collections than unauthorized IEnumerables. If you have an array, it divides the length of the array by your number of processor cores and performs their tasks evenly. But if you have IEnumerable with an unknown length, it does a dumb exponential overclock type, where tasks will process elements 1, 2, 4, 8, etc., until it reaches the end of IEnumerable.
  • By parallelizing all your queries, you break up the work into tiny pieces. If you have M parallel queries on N elements, you get M * N tasks. This has more overhead than if you just parallelized the last query, in which case you will only have N tasks.
  • PLINQ is best suited when each task takes approximately the same amount of time to process. Thus, he can divide them evenly between the cores. By parallelizing each of your queries with different performance characteristics, you have M * N tasks that take up different periods of time, and PLINQ cannot schedule them optimally (because it does not know in advance how much time each one can pass).

So, the general guide is here: make sure you have an array before starting, if possible, and just before evaluating, put only AsParallel at the very last request. So something like the following should work very well:

 var firstQuery = someDictionary.SelectMany().ToArray().Select(FirstTransformation); var secondQuery = firstQuery.Select(SecondTransformation); var thirdQuery = secondQuery.Select(ThirdTransformation).AsParallel().Where(SomeConditionCheck).ToArray(); var finalQuery = thirdQuery.Select(FinalTransformation).AsParallel().Where(x => x != null); 
+14
Feb 06 '12 at 19:50
source share

It is almost impossible to say without seeing the actual code. But as a general guideline, you should consider to avoid P / LINQ during a complex crunch number because the delegate and IEnumerable overhead are too high. The speed you get with threads is very likely eaten by the convenient abstractions LINQ provides.

Here is some code that calculates the sum of 2 whole lists, makes some int to compare with a floating point, and then calculates cos from it. Pretty simple things you can do well with the LINQ.Zip operator ... or the old fashioned way with a for loop.

Update 1 with updated ParallelLinq on my main Haswell 8 machine

  • Linq 0.95s
  • Linq Parallel 0.19s
  • Optimized 0.45s
  • Optimized Parallel 0.08 s

Update 1 end

  • LINQ 1.65s
  • Optimized 0.64 s
  • Optimized Parallel 0.40 s

The time difference is almost factor 3 due to IEnumerable laziness and method invocation (I used Release x32 mode of Windows 7, .NET 4 dual core). I tried using AsParallel in the LINQ version, but it really got slower (2,3s). If you are managing data, you should use the Parallel.For construct to get good scalability. IEnumerable itself is a poor candidate for parallelization because

  • You do not know how much work you have before you transfer to the end.
  • You cannot make hot snippets because you donโ€™t know how quickly IEnumerable will return the next element (maybe a web service call or access to an array index).

The following is sample code to illustrate this point. If you want to optimize more towards bare metal, you first need to get rid of the abstractions that are too expensive for each item. Access to the array is much cheaper compared to uninterrupted calls to MoveNext () and Current method.

  class Program { static void Main(string[] args) { var A = new List<int>(Enumerable.Range(0, 10*1000*1000)); var B = new List<int>(Enumerable.Range(0, 10*1000*1000)); double[] Actual = UseLinq(A, B); double[] pActual = UseLinqParallel(A, B); var other = Optimized(A, B); var pother = OptimizedParallel(A, B); } private static double[] UseLinq(List<int> A, List<int> B) { var sw = Stopwatch.StartNew(); var Merged = A.Zip(B, (a, b) => a + b); var Converted = A.Select(a => (float)a); var Result = Merged.Zip(Converted, (m, c) => Math.Cos((double)m / ((double)c + 1))); double[] Actual = Result.ToArray(); sw.Stop(); Console.WriteLine("Linq {0:F2}s", sw.Elapsed.TotalSeconds); return Actual; } private static double[] UseLinqParallel(List<int> A, List<int> B) { var sw = Stopwatch.StartNew(); var x = A.AsParallel(); var y = B.AsParallel(); var Merged = x.Zip(y, (a, b) => a + b); var Converted = x.Select(a => (float)a); var Result = Merged.Zip(Converted, (m, c) => Math.Cos((double)m / ((double)c + 1))); double[] Actual = Result.ToArray(); sw.Stop(); Console.WriteLine("Linq Parallel {0:F2}s", sw.Elapsed.TotalSeconds); return Actual; } private static double[] OptimizedParallel(List<int> A, List<int> B) { double[] result = new double[A.Count]; var sw = Stopwatch.StartNew(); Parallel.For(0, A.Count, (i) => { var sum = A[i] + B[i]; result[i] = Math.Cos((double)sum / ((double)((float)A[i]) + 1)); }); sw.Stop(); Console.WriteLine("Optimized Parallel {0:F2}s", sw.Elapsed.TotalSeconds); return result; } private static double[] Optimized(List<int> A, List<int> B) { double[] result = new double[A.Count]; var sw = Stopwatch.StartNew(); for(int i=0;i<A.Count;i++) { var sum = A[i] + B[i]; result[i] = Math.Cos((double)sum / ((double)((float)A[i]) + 1)); } sw.Stop(); Console.WriteLine("Optimized {0:F2}s", sw.Elapsed.TotalSeconds); return result; } } } 
+7
Feb 06 2018-12-12T00:
source share



All Articles