MinMaxPriorityQueue using Java threads

I am looking for a memory efficient way in Java to find the top n elements from a huge collection. For example, I have a word, a distance () method, and a collection of β€œall” words. I implemented a couple of classes that implements compareTo (), so the pairs are sorted by their values.

Using threads, my naive solution looks like this:

double distance(String word1, String word2){ ... } Collection<String> words = ...; String word = "..."; words.stream() .map(w -> new Pair<String, Double>(w, distance(word, w))) .sorted() .limit(n); 

As far as I understand, this will process and move each element in words so that it can be sorted before applying limit (). However, it is more efficient in terms of memory to have a collection in which n items are stored, and whenever a new item is added, it removes the smallest item (in accordance with the natural order of the comparable object) and therefore will never be anymore. than n (or n + 1).

This is exactly what Guava MinMaxPriorityQueue does . So my current best solution to this problem is:

 Queue<Pair<String, Double>> neighbours = MinMaxPriorityQueue.maximumSize(n).create(); words.stream() .forEach(w -> neighbours.add(new Pair<String, Double>(w, distance(word, w))); 

Sorting of the top n elements remains to be done after converting the queue to a stream or list, but this is not a problem since n is relatively small.

My question is: is there a way to do the same using streams?

+7
java sorting guava java-8 java-stream
source share
1 answer

A heap-based structure will of course be more efficient than sorting an entire huge list. Fortunately, the thread library is delighted to use specialized collections if necessary:

 MinMaxPriorityQueue<Pair<String, Double>> topN = words.stream() .map(w -> new Pair<String, Double>(w, distance(word, w))) .collect(toCollection( () -> MinMaxPriorityQueue.maximumSize(n).create() )); 

This is better than the .forEach solution because it is easy to parallelize and more idiomatically java8.

Please note that () -> MinMaxPriorityQueue.maximumSize(n).create() should be replaced by MinMaxPriorityQueue.maximumSize(n)::create , but for some reason it will not compile under certain conditions (see comments below )

+2
source share

All Articles