Given a large collection (let's call it "a") with elements of type T (say, a vector or a list) and the evaluation function "f" (say (T) => Double), I would like to get a set of results from 'a' ', which contains N elements' a', which lead to the highest value in f. Collection 'a' may contain duplicates. It is not sorted.
Perhaps leaving the parallelism question (map / reduce, etc.) aside for a moment, what would be the appropriate Scala data structure to compile the result collection 'b'? Thanks for any pointers / ideas.
Notes:
(1) I suggest that my use case may be most succinctly expressed as
val a = Vector( 9,2,6,1,7,5,2,6,9 ) // just an example val f : (Int)=>Double = (n)=>n // evaluation function val b = a.sortBy( f ).take( N ) // sort, then clip
except that I do not want to sort the entire set.
(2) one option would be to iterate over 'a', which fills the TreeSet with a "manual" size limit (reject everything that is worse than the worst element in the set, prevent the set from expanding beyond N). However, I would like to keep duplicates present in the source set in the result set, and therefore this may not work.
(3) if the sorted multi-set is the correct data structure, is there a Scala implementation of this? Or a binary-sorted vector or an array if the result set is small enough?