How to find the largest M numbers of N numbers in Java 8?

IntStream may be the easiest way, but I can only pick the smallest numbers M, as shown below:

public class Test { private static final int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6}; public static void main(String[] args) throws Exception { System.out.println(Arrays.asList(IntStream.of(arr).sorted().limit(5).boxed().toArray())); } } 

btw, given the complexity of the algorithm and assuming that N β†’ M, the sorted + limit approach has complexity O (N log (N)).

I think O (N log (M)) can achieve the best complexity, but I don't know if Java 8 has such stream methods or collectors.

+3
source share
5 answers

If you must use Streams:

 IntStream.of(arr).sorted().skip(NM) 

Otherwise, use PriorityQueue and write yourself an inverted Comparator . The insert will be O (N (log (N)), and the deletion of the elements of M will be O (M (log (N)). Not what you requested, but perhaps close enough.

+5
source

EJP has this right, I tested it - it gives 8 and 9 when setting input 2.

 import java.util.stream.IntStream; public class Test { private static final int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6}; public static void main(String[] args) throws Exception { int n = Integer.parseInt(args[0]); System.out.println("Finding "+n+" largest numbers in arr"); IntStream.of(arr).sorted().skip(arr.length-n).boxed().forEach(big -> System.out.println(big)); } } 
+3
source

If you already use google guava in your project, you can use MinMaxPriorityQueue :

 Collection<..> min5 = stream.collect( toCollection(MinMaxPriorityQueue.maximumSize(5)::create) ); 
+2
source

To solve your problem, you can create a custom collector using the JDK PriorityQueue :

 public static <T> Collector<T, ?, List<T>> maxN(Comparator<? super T> comparator, int limit) { BiConsumer<PriorityQueue<T>, T> accumulator = (queue, t) -> { queue.add(t); if (queue.size() > limit) queue.poll(); }; return Collector.of(() -> new PriorityQueue<>(limit + 1, comparator), accumulator, (q1, q2) -> { for (T t : q2) { accumulator.accept(q1, t); } return q1; }, queue -> new ArrayList<>(queue)); } 

Using:

 int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6}; System.out.println(IntStream.of(arr).boxed().collect(maxN(Comparator.naturalOrder(), 2))); // [8, 9] System.out.println(IntStream.of(arr).boxed().collect(maxN(Comparator.reverseOrder(), 3))); // [3, 1, 2] 

This can be faster for large datasets and small constraints as they are not sorted. If you want a sorted result, you can add a sort step in finisher .

+1
source

You can achieve your complexity goal by creating a histogram of values:

 public static IntStream maxValues(IntStream source, int limit) { TreeMap<Integer,Integer> m=new TreeMap<>(); source.forEachOrdered(new IntConsumer() { int size, min=Integer.MIN_VALUE; public void accept(int value) { if(value<min) return; m.merge(value, 1, Integer::sum); if(size<limit) size++; else m.compute(min=m.firstKey(), (k,count)->count==1? null: count-1); } }); if(m.size()==limit)// no duplicates return m.keySet().stream().mapToInt(Integer::valueOf); return m.entrySet().stream().flatMapToInt(e->{ int value = e.getKey(), count = e.getValue(); return count==1? IntStream.of(value): IntStream.range(0, count).map(i->value); }); } 

It creates a map from int values ​​to their corresponding number of occurrences, but limits its contents to the desired number of values, therefore, its work has complexity O(log(M)) (the worst case, if there are no duplicates), and since the operation is performed once for each value, its total complexity is O(NΓ—log(M)) as you wish.

You can test it with the original array as

 int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6}; maxValues(Arrays.stream(arr), 3).forEach(System.out::println); 

but to check for some corner cases you can use an array containing duplicates like

 int[] arr = {8, 5, 3, 4, 2, 2, 9, 1, 7, 9, 8, 6}; // note that the stream of three max elements contains one of the two eights 

If you want maximum performance, replacing the box trackcard with an adequate data structure using primitive data types may be feasible, but it will be a small performance optimization, since this solution has already solved the complexity problem.

By the way, this solution works for arbitrary threads, i.e. no need to know the value of N

+1
source

All Articles