Most likely, you have already studied the source code:
class OrderedRDDFunctions { // <snip> def sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.size): RDD[P] = { val part = new RangePartitioner(numPartitions, self, ascending) val shuffled = new ShuffledRDD[K, V, P](self, part) shuffled.mapPartitions(iter => { val buf = iter.toArray if (ascending) { buf.sortWith((x, y) => x._1 < y._1).iterator } else { buf.sortWith((x, y) => x._1 > y._1).iterator } }, preservesPartitioning = true) }
And, as you say, these integers must go through the shuffling stage - as can be seen from the fragment.
However, your concern about the subsequent call to take (K) may not be as accurate. This operation does NOT cycle through all N elements:
def take(num: Int): Array[T] = {
So it would seem:
O (myRdd.take (K)) <O (myRdd.sortByKey ()) ~ = O (myRdd.sortByKey.take (k)) (at least for small K) O (myRdd.sortByKey (). Collect ( )
javadba May 24 '14 at 7:23 2014-05-24 07:23
source share