How to deal with SPARK-5063 error in sparks

I get the error SPARK-5063 in the println line

val d.foreach{x=> for(i<-0 until x.length) println(m.lookup(x(i)))} 

d RDD[Array[String]] m RDD[(String, String)] . Is there a way to print the way I want? or how to convert d from RDD[Array[String]] to Array[String] ?

+7
scala apache-spark
source share
1 answer

SPARK-5063 is among the best error messages when trying to nest RDD operations that are not supported.

This is a usability problem, not a functional one. The root cause is nesting RDD operations, and the solution should break it down.

Here we are trying to join dRDD and mRDD . If the size of mRDD large, a rdd.join would be the recommended method otherwise, if mRDD is small, that is, it fits in the memory of each artist, we could collect it, broadcast it and make a “map” join.

Join

A simple join would look like this:

 val rdd = sc.parallelize(Seq(Array("one","two","three"), Array("four", "five", "six"))) val map = sc.parallelize(Seq("one" -> 1, "two" -> 2, "three" -> 3, "four" -> 4, "five" -> 5, "six"->6)) val flat = rdd.flatMap(_.toSeq).keyBy(x=>x) val res = flat.join(map).map{case (k,v) => v} 

If we want to use broadcasting, we first need to collect the permissions table value locally so that b / c is for all executors. NOTE RDD, which will be broadcast, MUST fit into the memory of the driver, as well as each artist.

Broadcast Card Side Connector

 val rdd = sc.parallelize(Seq(Array("one","two","three"), Array("four", "five", "six"))) val map = sc.parallelize(Seq("one" -> 1, "two" -> 2, "three" -> 3, "four" -> 4, "five" -> 5, "six"->6))) val bcTable = sc.broadcast(map.collectAsMap) val res2 = rdd.flatMap{arr => arr.map(elem => (elem, bcTable.value(elem)))} 
+10
source share

All Articles