Is it possible to perform Left reduction in parallel?

I just started to learn Scala, so please be patient :-)

I have a question about how reduceLeft behaves. Here is an example:

List(1, 2, 3, 4, 5) reduceLeft (_ + _) 

I wonder if the calculation can be performed simultaneously, for example:

first round:

  • process 1 calculates: 1 + 2
  • process 2 calculates: 4 + 5

second round:

  • process 1 calculates: 3 + 3

third round:

  • process 1 calculates: 6 + 9

At least this is what I would expect if I just used the reduce function instead of reduceLeft. Or does Left really only make one reduction at a time?

 ((((1 + 2) + 3) + 4) + 5) 

This would mean that it cannot be executed in parallel, and one should always want to reduce with reduceLeft / Right, if possible?

+6
source share
2 answers

The answer is yes, and it is very simple:

 List(1, 2, 3, 4, 5).par.reduce (_ + _) 

The par method turns a list into a parallel collection. When you call reduce in this parallel collection, it will execute in parallel.

See parallel collection documentation

+8
source

As you noticed, reduceLeft not parallelizable, since it explicitly takes a form that is not associative: (B,A) => B

As long as you use the associative operator, reduce is parallelizable.

There is also an analogue to foldLeft , called aggregate , which takes two functions: one for displaying in a combinable form and two associative for combining elements: (B,A)=>B, (B,B) => B

This one, while the two functions agree to the output, and you have zero to move to where you want, is parallelizable.

So, if you want to be parallel,

 reduceLeft/Right -> reduce foldLeft/Right -> aggregate 

There may be times when reduce more restrictive than reduceLeft , but aggregate will do the trick.

However, this only makes the statement capable of being parallel. For its actual parallelism, you need to use a collection that inherits from ParIterable , and they all have Par in their names: ParVector , etc. The easiest way to get a parallel collection is to call .par on the regular one ( .seq goes the other way, from parallel to non-parallel). This was done so, because there is generally no reason to be parallel, except for speed, but parallelism adds additional overhead. Therefore, you should only work in parallel if there is enough work, and although you may know this, the compiler probably does not. Therefore, you must explicitly choose which collection you want. (Parallel collections return parallel and consecutive inverse serial.)

+4
source

All Articles