Why a combiner is needed for a reduction method that converts a type in java 8

I'm having trouble fully understanding the role that combiner in the Streams reduce method.

For example, the following code does not compile:

 int length = asList("str1", "str2").stream() .reduce(0, (accumulatedInt, str) -> accumulatedInt + str.length()); 

A compilation error says: (argument mismatch; int cannot be converted to java.lang.String)

but this code compiles:

 int length = asList("str1", "str2").stream() .reduce(0, (accumulatedInt, str ) -> accumulatedInt + str.length(), (accumulatedInt, accumulatedInt2) -> accumulatedInt + accumulatedInt2); 

I understand that the combiner method is used in parallel threads, so in my example it combines two intermediate accumulated ints.

But I don’t understand why the first example does not compile without a combiner or how the combiner decides to convert a string to int, because it just adds two ints.

Can anyone shed some light on this?

+64
java java-8 java-stream
Jun 19 '14 at 13:45
source share
4 answers

The two and three versions of the reduce arguments that you tried to use do not accept the same type for accumulator .

The two reduce arguments are defined as :

 T reduce(T identity, BinaryOperator<T> accumulator) 

In your case, T is String, so BinaryOperator<T> should take two String arguments and return a String. But you pass it int and String, which leads to a compilation error - argument mismatch; int cannot be converted to java.lang.String argument mismatch; int cannot be converted to java.lang.String . Actually, I think the transfer is 0, since the identity value here is also incorrect, since String (T) is expected.

Also note that this version reduces the stream process Ts and returns T, so you cannot use it to reduce the stream stream to int.

The three reduce arguments are defined as :

 <U> U reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner) 

In your case, U is Integer and T is String, so this method will reduce the String stream to Integer.

For battery BiFunction<U,? super T,U> BiFunction<U,? super T,U> you can pass parameters of two different types (U and? super T), which in your case are Integer and String. In addition, the value of U identity takes an integer in your case, so passing it 0 is fine.

Another way to achieve what you want:

 int length = asList("str1", "str2").stream().mapToInt (s -> s.length()) .reduce(0, (accumulatedInt, len) -> accumulatedInt + len); 

Here, the stream type corresponds to the reduce return type, so you can use two versions of the reduce parameters.

Of course, you do not need to use reduce at all:

 int length = asList("str1", "str2").stream().mapToInt (s -> s.length()) .sum(); 
+40
Jun 19 '14 at 14:24
source share

Eran’s answer described the differences between the reduce versions of the two-arg and three-arg versions, in which the former reduces Stream<T> to T , while the latter decreases Stream<T> to U However, in reality this did not explain the need for an additional combiner function when reducing Stream<T> to U

One of the principles of designing Streams APIs is that the API should not differentiate between serial and parallel threads, or else, a specific API should not prevent the flow from working sequentially or in parallel. If your lambdas have the correct properties (associative, non-interfering, etc.), a thread executed sequentially or in parallel should give the same results.

Let's first consider a shorthand version with two arguments:

 T reduce(I, (T, T) -> T) 

Consistent implementation is simple. The identity value I “accumulates” with the zero-flow element to give a result. This result accumulates with the first element of the flow to give another result, which, in turn, accumulates with the second element of the flow, etc. After the last element is accumulated, the final result is returned.

Parallel implementation begins by dividing the stream into segments. Each segment is processed by its own stream in the sequential manner described above. Now, if we have N threads, we have N intermediate results. They need to be reduced to one result. Since each intermediate result is of type T, and we have several, we can use the same accumulator function to reduce the intermediate results of N to one result.

Now consider a hypothetical operation to reduce two arguments, which reduces Stream<T> to U In other languages, this is called the “fold” or “fold-left” operation, so I will name it here. Please note that this does not exist in Java.

 U foldLeft(I, (U, T) -> U) 

(Note that the identity value of I is of type U.)

The serial version of foldLeft similar to the serial version of reduce , except that the intermediate values ​​are of type U instead of type T. But the rest is the same. (The hypothetical foldRight operation would be similar, except that the operations would be performed from right to left, and not from left to right.)

Now consider the parallel version of foldLeft . Let the separation of the flow into segments begin. Then we can have that each of the N threads decreases the values ​​of T in its segment by N intermediate values ​​of type U. Now what? How do we get from N values ​​of type U to one result of type U?

There is no other function that combines several intermediate results of type U into a single result of type U. If we have a function that combines two values ​​of U into one that are sufficient to reduce any number of values ​​to one, just like the original reduction above . Thus, the reduction operation, which gives a different type of result, needs two functions:

 U reduce(I, (U, T) -> U, (U, U) -> U) 

Or using Java syntax:

 <U> U reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner) 

So, for parallel reduction for another type of result, we need two functions: one that accumulates the elements of T to intermediate values ​​of U, and the second that combines the intermediate values ​​of U into a single result of U. If we do not switch types, it turns out that the battery function is same as combiner function. The fact that reduction to the same type has only the function of the battery, and reduction to another type requires separate functions of the battery and combiner.

Finally, Java does not provide the operations foldLeft and foldRight , because they imply a specific order of operations, which is essentially sequential. This contradicts the design principle described above for providing APIs that equally support serial and parallel operation.

+100
Jun 19 '14 at 21:20
source share

Since I like scribbles and arrows to clarify the concepts ... let start!

From String to String (serial stream)

Suppose you have 4 lines: your goal is to combine such lines into one. Basically you start with a type and end with the same type.

You can achieve this with

 String res = Arrays.asList("one", "two","three","four") .stream() .reduce("", (accumulatedStr, str) -> accumulatedStr + str); //accumulator 

and this will help you understand what is going on:

enter image description here

The battery function converts, step by step, the elements in your (red) stream to the final reduced (green) value. The battery function simply converts the String object to another String .

String to int (parallel thread)

Suppose you have the same 4 lines: your new goal is to sum their length and you want to parallelize your stream.

What you need is:

 int length = Arrays.asList("one", "two","three","four") .parallelStream() .reduce(0, (accumulatedInt, str) -> accumulatedInt + str.length(), //accumulator (accumulatedInt, accumulatedInt2) -> accumulatedInt + accumulatedInt2); //combiner 

and this is a diagram of what happens

enter image description here

Here, the battery function (a BiFunction ) allows you to convert your String data into int data. Being a parallel stream, it is divided into two (red) parts, each of which is developed independently of each other and produces the same partial (orange) results. The definition of a combiner is necessary to provide a rule for combining partial int results into a final (green) int one.

String to int (serial stream)

What if you do not want to parallelize your stream? Well, in any case, the combine must be provided, but it will never be called if no partial results are obtained.

+51
Nov 28 '15 at 12:44
source share

There is no version of the abbreviation that accepts two different types without a combiner, since it cannot be executed in parallel (not sure why this requirement). The fact that the battery must be associative makes this interface almost useless, because:

 list.stream().reduce(identity, accumulator, combiner); 

It produces the same results as:

 list.stream().map(i -> accumulator(identity, i)) .reduce(identity, combiner); 
0
Sep 04 '15 at 7:39
source share



All Articles