How to use RangePartitioner in Spark

I want to use RangePartitioner in my Java Spark application, but I don’t know how to set two scala parameters scala.math.Ordering<K> evidence$1 and scala.reflect.ClassTag<K> evidence$2 . Can someone give me an example?

Here is a link to the JavaDoc from RangePartitioner (this did not help me because I am new to Spark and Scala ...):

My code looks like this:

 JavaPairRDD<Integer, String> partitionedRDD = rdd.partitionBy(new RangePartitioner<Integer, String>(10, rdd, true, evidence$1, evidence$2)); 
+7
scala scala-java-interop partitioning apache-spark
source share
2 answers

If you look at the api for OrderedRDDFunctions , an example will appear of how you can set the implicit order for your key.

 import org.apache.spark.SparkContext._ val rdd: RDD[(String, Int)] = ... implicit val caseInsensitiveOrdering = new Ordering[String] { override def compare(a: String, b: String) = a.toLowerCase.compare(b.toLowerCase) } 

I know a fragment of it from sparkscala apis, but you can at least indicate how to pass your Ordering parameter. For a ClassTag type, I suggest checking out a generic scala doc or forum. Adding a scala tag to the question.

0
source share

You can create both Ordering and ClassTag by calling methods on your companion objects.

In java, they are referenced as follows: ClassName$.MODULE$.functionName()

Another wrinkle is that the constructor requires scala RDD, not java. You can get scala RDD from java PairRDD by calling rdd.rdd()

  final Ordering<Integer> ordering = Ordering$.MODULE$.comparatorToOrdering(Comparator.<Integer>naturalOrder()); final ClassTag<Integer> classTag = ClassTag$.MODULE$.apply(Integer.class); final RangePartitioner<Integer, String> partitioner = new RangePartitioner<>( 10, rdd.rdd(), //note the call to rdd.rdd() here, this gets the scala RDD backing the java one true, ordering, classTag); final JavaPairRDD<Integer, String> partitioned = rdd.partitionBy(partitioner); 
0
source share

All Articles