In the Stream Stream method, is it necessary that the identifier always be 0 for the sum and 1 for the multiplication?

I continue to learn in Java 8.

I found an interesting behavior:

allows you to see an example code:

// identity value and accumulator and combiner Integer summaryAge = Person.getPersons().stream() //.parallel() //will return surprising result .reduce(1, (intermediateResult, p) -> intermediateResult + p.age, (ir1, ir2) -> ir1 + ir2); System.out.println(summaryAge); 

and model class:

 public class Person { String name; Integer age; ///... public static Collection<Person> getPersons() { List<Person> persons = new ArrayList<>(); persons.add(new Person("Vasya", 12)); persons.add(new Person("Petya", 32)); persons.add(new Person("Serj", 10)); persons.add(new Person("Onotole", 18)); return persons; } } 

12 + 32 + 10 + 18 = 72
For sequential flow, this code always returns 73 (72 + 1), but for concurrency it always returns 76 (72 + 4 * 1). 4 - the number of flow elements.

When I saw this result, I thought it was strange that parallel threads and sequential threads return different results.

Am I breaking a contract somewhere?

PS

for me 73 is the expected result, but 76 is not.

+7
java java-8 java-stream reduce
source share
5 answers

An identifier value is a value such that x op identity = x . This is a concept that is not unique to Java Stream s; see, for example, Wikipedia .

It lists some examples of identification elements, some of which can be directly expressed in Java code, for example.

  • reduce("", String::concat)
  • reduce(true, (a,b) -> a&&b)
  • reduce(false, (a,b) -> a||b)
  • reduce(Collections.emptySet(), (a,b)->{ Set<X> s=new HashSet<>(a); s.addAll(b); return s; })
  • reduce(Double.POSITIVE_INFINITY, Math::min)
  • reduce(Double.NEGATIVE_INFINITY, Math::max)

It should be clear that the expression x + y == x for an arbitrary x can only be performed for y==0 , so 0 is the unit element for addition. Similarly, 1 is a unit element for multiplication.

More complex examples:

  • Reducing predicate flow

     reduce(x->true, Predicate::and) reduce(x->false, Predicate::or) 
  • Reduced function flow

     reduce(Function.identity(), Function::andThen) 
+18
source share

Yes, you are violating the combiner function contract. The identity, which is the first element of reduce , must satisfy combiner(identity, u) == u . Quoting Javadoc Stream.reduce :

The identifier value must be the identifier of the combiner function. This means that for all u combiner(identity, u) is equal to u .

However, the combiner function performs the addition, and 1 not an identification element to add; 0 is.

  • Change the identifier used for 0 , and you will not be surprised: the result will be 72 for two parameters.

  • For your own entertainment, change the combiner function to perform multiplication (keeping the identifier to 1), and you will also notice the same result for both parameters.

Let's build an example where the identifier is not 0 or 1. Given your own domain class, consider:

 System.out.println(Person.getPersons().stream() .reduce("", (acc, p) -> acc.length() > p.name.length() ? acc : p.name, (n1, n2) -> n1.length() > n2.length() ? n1 : n2)); 

This will reduce the flow of the Face to the name of the longest person.

+5
source share

The JavaDoc documentation for Stream.reduce states that

The identifier value must be the identifier of the combiner function

1 is not an identification value for the addition operator, so you get unexpected results. If you used 0 (this is the value of the identifier of the addition operator), you will get the same result from serial and parallel threads.

+3
source share

In addition to the excellent answers posted before it should be mentioned that if you want to start summing with something other than zero, you can simply transfer the initial complement from the stream operation:

 Integer summaryAge = Person.getPersons().stream() //.parallel() //will return no surprising result .reduce(0, (intermediateResult, p) -> intermediateResult + p.age, (ir1, ir2) -> ir1 + ir2)+1; 

The same is possible for other recovery operations. For example, if you want to calculate a product starting with 2 instead of doing the wrong .reduce(2, (a, b) -> a*b) , you can do .reduce(1, (a, b) -> a*b)*2 . Just find the real identity for your operation, move the “false identity” outside, and you will get the correct result for both serial and parallel cases.

Finally, pay attention to a more efficient way to solve your problem:

 Integer summaryAge = Person.getPersons().stream() //.parallel() //will return no surprising result .collect(Collectors.summingInt(p -> p.age))+1; 

or alternatively

 Integer summaryAge = Person.getPersons().stream() //.parallel() //will return no surprising result .mapToInt(p -> p.age).sum()+1; 

Here, the summation is performed without boxing at each intermediate step, so it can be much faster.

+1
source share

There are really two parts to your question. Why do you get 76 using parallel when you get 73 using serial. And that person, since multiplication and addition goes for Reduction.

The answer to the latter will help in answering the first part. Identity is a mathematical concept, I will try to stick to simple terms for those who are not mathematicians. Identity is a value that applies to itself, returns the same value.

The additive identity is 0. If we assumed that a is any number, then the identity property of the numbers means that a plus its identifier will return a . (basically a + 0 = a ). A multiplicative identity says that b times its identity, which is 1) always returns itself, b .

The java reduction method uses identification a little more mutably. Providing us the opportunity to say, we would like to perform addition and multiplication operations with an additional step, if we want. If you take your example: and change your personality to 0, you will get 72.

  Integer summaryAge = Person.getPersons().stream() .reduce(0, (intermediateResult, p) -> intermediateResult + p.age, (ir1, ir2) -> ir1 + ir2); System.out.println(summaryAge); 

It just sums the ages together and returns that value. Change it to 100, you will return 172. But when you execute the parallel, why does your result get 76, and in my example will return 472? Because when you use a stream, the results are considered a set, not individual elements. For JavaDocs in threads:

Threads facilitate parallel execution by updating the calculation as a conveyor of aggregate operations, rather than as mandatory operations for each individual element.

Why handling sets is important when using a standard thread (not: parallel or parallel thread), what you do in your example takes a sum and processes this single number. Therefore, you get 73 and change your identity to 100, I would get 172. But why is this so, using the parallel, you get 76? or in my example 472? Since java now splits the set into smaller (single) elements, adding its identifier (which you specified as 1), summing it, and then summing the result with the rest of the elements that performed the same operation.

If your intent is to add 1 to the result, it’s safer to follow Tagir’s suggestion and add 1 to the end after returning the stream.

0
source share

All Articles