I think there is no special API for sorting data by value.
You may need to follow these steps:
1) Replace key and value
2) Use the sortByKey API
3) Replace key and value
Take a look at more details on sortByKey in the beloe reference:
https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/api/java/JavaPairRDD.html#sortByKey%28boolean%29
for swap, we can use the Scala Tuple API:
http://www.scala-lang.org/api/current/index.html#scala.Tuple2
For example, I have a Java Pair RDD from the function below.
JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer i1, Integer i2) { return i1 + i2; } });
Now, to change the key and value, you can use the code below:
JavaPairRDD<Integer, String> swappedPair = counts.mapToPair(new PairFunction<Tuple2<String, Integer>, Integer, String>() { @Override public Tuple2<Integer, String> call(Tuple2<String, Integer> item) throws Exception { return item.swap(); } });
Hope this helps. You need to take care of data types.
source share