What is the best way to combine threads into one DISTINCT with Java 8

Suppose I have several java 8 threads that each thread could potentially be converted to Set<AppStory> , now I want the best performance to combine all threads into one DISTINCT thread by ID, sorted by property ("lastUpdate")

There are several ways to do that, but I want the fastest, for example:

 Set<AppStory> appStr1 =StreamSupport.stream(splititerato1, true). map(storyId1 -> vertexToStory1(storyId1).collect(toSet()); Set<AppStory> appStr2 =StreamSupport.stream(splititerato2, true). map(storyId2 -> vertexToStory2(storyId1).collect(toSet()); Set<AppStory> appStr3 =StreamSupport.stream(splititerato3, true). map(storyId3 -> vertexToStory3(storyId3).collect(toSet()); Set<AppStory> set = new HashSet<>(); set.addAll(appStr1) set.addAll(appStr2) set.addAll(appStr3) , and than make sort by "lastUpdate".. //POJO Object: public class AppStory implements Comparable<AppStory> { private String storyId; private String ........... many other attributes...... public String getStoryId() { return storyId; } @Override public int compareTo(AppStory o) { return this.getStoryId().compareTo(o.getStoryId()); } } 

... but this is an old way.

How can I create ONE DISTINCT on a sorted stream using BEST PERFORMANCE

sort of:

  Set<AppStory> finalSet = distinctStream.sort((v1, v2) -> Integer.compare('not my issue').collect(toSet()) 

Any ideas?

BR

Vitaliy

+5
source share
2 answers

I think the parallel overhead is much more than the actual work, as you stated in the comments. So let your Stream do the job sequentially.

FYI: you should use Stream::concat because cutting operations such as Stream::limit can be bypassed with Stream::flatMap .

Stream::sorted collects each item in Stream in a List , sorts the List , and then clicks the items in the desired order through the pipeline. Then the elements are collected again. Thus, this can be avoided by collecting items in a List and sorting subsequently. Using List is a much better choice than using Set , because order matters (I know there is a LinkedHashSet , but you can't sort it).

This, in my opinion, is the cleanest and possibly the fastest solution, since we cannot prove it.

 Stream<AppStory> appStr1 =StreamSupport.stream(splititerato1, false) .map(this::vertexToStory1); Stream<AppStory> appStr2 =StreamSupport.stream(splititerato2, false) .map(this::vertexToStory2); Stream<AppStory> appStr3 =StreamSupport.stream(splititerato3, false) .map(this::vertexToStory3); List<AppStory> stories = Stream.concat(Stream.concat(appStr1, appStr2), appStr3) .distinct().collect(Collectors.toList()); // assuming AppStory::getLastUpdateTime is of type `long` stories.sort(Comparator.comparingLong(AppStory::getLastUpdateTime)); 
+1
source

I cannot guarantee that it will be faster than yours (I think so, but you will need to do it for sure), but you can just do it by assuming that you have 3 threads:

 List<AppStory> distinctSortedAppStories = Stream.of(stream1, stream2, stream3) .flatMap(Function.identity()) .map(this::vertexToStory) .distinct() .sorted(Comparator.comparing(AppStory::getLastUpdate)) .collect(Collectors.toList()); 
+1
source

All Articles