, mapPartitions.
rdd.mapPartitions(Random.shuffle(_));
For PairRDD(RDD type RDD[(K, V)]), if you are interested in shuffling key value mappings (matching an arbitrary key with an arbitrary value):
pairRDD.mapPartitions(iterator => {
val (keySequence, valueSequence) = iterator.toSeq.unzip
val shuffledValueSequence = Random.shuffle(valueSequence)
keySequence.zip(shuffledValueSequence).toIterator
}, true)
The Boolean flag at the end means that the partitioning is saved (keys are not changed) for this operation, so that subsequent operations, for example, reduceByKeycan be optimized (avoid shuffling).
source
share