I am looking for a memory efficient way in Java to find the top n elements from a huge collection. For example, I have a word, a distance () method, and a collection of βallβ words. I implemented a couple of classes that implements compareTo (), so the pairs are sorted by their values.
Using threads, my naive solution looks like this:
double distance(String word1, String word2){ ... } Collection<String> words = ...; String word = "..."; words.stream() .map(w -> new Pair<String, Double>(w, distance(word, w))) .sorted() .limit(n);
As far as I understand, this will process and move each element in words so that it can be sorted before applying limit (). However, it is more efficient in terms of memory to have a collection in which n items are stored, and whenever a new item is added, it removes the smallest item (in accordance with the natural order of the comparable object) and therefore will never be anymore. than n (or n + 1).
This is exactly what Guava MinMaxPriorityQueue does . So my current best solution to this problem is:
Queue<Pair<String, Double>> neighbours = MinMaxPriorityQueue.maximumSize(n).create(); words.stream() .forEach(w -> neighbours.add(new Pair<String, Double>(w, distance(word, w)));
Sorting of the top n elements remains to be done after converting the queue to a stream or list, but this is not a problem since n is relatively small.
My question is: is there a way to do the same using streams?
java sorting guava java-8 java-stream
Carsten
source share