Choose between Stream and Collections API

Consider the following example, which prints the maximum element in a List :

 List<Integer> list = Arrays.asList(1,4,3,9,7,4,8); list.stream().max(Comparator.naturalOrder()).ifPresent(System.out::println); 

The same goal can be achieved using the Collections.max method:

 System.out.println(Collections.max(list)); 

The above code is not only shorter, but also cleaner to read (in my opinion). There are similar examples that come to mind, such as using binarySearch vs filter , used in conjunction with findAny .

I understand that Stream can be an infinite pipeline, not a Collection , limited by the memory available to the JVM. These would be my criteria for deciding whether to use the Stream API or Collections . Are there other reasons for choosing Stream over Collections API (e.g. performance). More generally, is this the only reason for choosing the Stream older API that can make the job cleaner and shorter?

+7
java collections java-8 java-stream
source share
2 answers

Stream API is similar to a Swiss army knife: it allows you to perform fairly complex operations, effectively combining tools. On the other hand, if you only need a screwdriver, it may be more convenient to use a separate screwdriver. The Stream API includes many things (e.g. distinct , sorted , primitive operations, etc.) that would otherwise require you to write a few lines and present intermediate variables / data structures and drill cycles, attracting the attention of the programmer from real algorithm. Sometimes using the Stream API can improve performance even for sequential code. For example, consider the old API:

 class Group { private Map<String, User> users; public List<User> getUsers() { return new ArrayList<>(users.values()); } } 

Here we want to return all users of the group. The API designer decided to return List . But it can be used outside in various ways:

 List<User> users = group.getUsers(); Collections.sort(users); someOtherMethod(users.toArray(new User[users.size])); 

Here it is sorted and converted to an array to pass some other method that took the array. Elsewhere, getUsers() can be used like this:

 List<User> users = group.getUsers(); for(User user : users) { if(user.getAge() < 18) { throw new IllegalStateException("Underage user in selected group!"); } } 

Here we just want the user to meet some criteria. In both cases, copying to an intermediate ArrayList was not actually necessary. When we upgrade to Java 8, we can replace the getUsers() method with users() :

 public Stream<User> users() { return users.values().stream(); } 

And change the caller code. First:

 someOtherMethod(group.users().sorted().toArray(User[]::new)); 

Second:

 if(group.users().anyMatch(user -> user.getAge() < 18)) { throw new IllegalStateException("Underage user in selected group!"); } 

Thus, it is not only shorter, but can also work faster because we skip intermediate copying.

Another conceptual point in the Stream API is that any stream code written as recommended can be parallelized simply by adding a parallel() step. Of course, this does not always increase productivity, but it helps more often than I expected. Typically, if an operation is performed sequentially for 0.1 ms or more , it may benefit from parallelization. In any case, we have not seen such an easy way to do parallel programming in Java before.

+5
source share

Of course, it always depends on the circumstances. Take the original example:

 List<Integer> list = Arrays.asList(1,4,3,9,7,4,8); list.stream().max(Comparator.naturalOrder()).ifPresent(System.out::println); 

If you want to do the same thing effectively, you should use

 IntStream.of(1,4,3,9,7,4,8).max().ifPresent(System.out::println); 

which does not provide for any automatic boxing. But if your assumption is to have a List<Integer> in advance, this might not be an option, so if you're just interested in the value of max , Collections.max might be an easier choice.

But that would lead to the question of why you have a List<Integer> in advance. Maybe this is the result of the old code (or new code written using the old thinking), which had no choice but to use boxing and Collection , since there was no alternative in the past?

So maybe you should think about the source that produces the collection before worrying about how to consume it (or well, think about all at the same time).

If all you have is Collection , and all you need is a single terminal operation for which there is a simple implementation based on Collection , you can use it directly without worrying about the Stream API. API designers recognized this idea by adding methods such as forEach(…) to the Collection API, instead of insisting on using all stream().forEach(…) . And Collection.forEach(…) not a simple short hand for Collection.stream().forEach(…) , in fact, it is already defined on the more abstract Iterable interface, which does not even have a stream() method.

Btw., You should understand the difference between Collections.binarySearch and Stream.filter/findAny . The first requires the collection to be sorted, and if this condition is met, it might be a better choice. But if the collection is not sorted, a simple linear search is more efficient than sorting for only one use of binary search, not to mention that binary search works with List only when filter / findAny works with any stream that supports each type of source collection .

+2
source share