Your problem is that part20to3_chaos is RDD[Int] , while OrderedRDDFunctions.repartitionAndSortWithinPartitions is a method that works with RDD[(K, V)] , where K is the key and V is the value.
repartitionAndSortWithinPartitions first redistribute the data based on the provided delimiter, and then sort by key:
/** * Repartition the RDD according to the given partitioner and, * within each resulting partition, sort records by their keys. * * This is more efficient than calling `repartition` and then sorting within each partition * because it can push the sorting down into the shuffle machinery. */ def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = self.withScope { new ShuffledRDD[K, V, V](self, partitioner).setKeyOrdering(ordering) }
So it looks like this is not exactly what you are looking for.
If you need a simple old type, you can use sortBy , since it does not require a key:
scala> val toTwenty = sc.parallelize(1 to 20, 3).distinct toTwenty: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[31] at distinct at <console>:33 scala> val sorted = toTwenty.sortBy(identity, true, 3).collect sorted: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
Where do you pass the sortBy order (ascending or descending) and the number of partitions you want to create.