Why can't the Spark / Scala compiler find toDF on RDD [Map [Int, Int]]?

Why does the error result?

scala> import sqlContext.implicits._ import sqlContext.implicits._ scala> val rdd = sc.parallelize(1 to 10).map(x => (Map(x -> 0), 0)) rdd: org.apache.spark.rdd.RDD[(scala.collection.immutable.Map[Int,Int], Int)] = MapPartitionsRDD[20] at map at <console>:27 scala> rdd.toDF res8: org.apache.spark.sql.DataFrame = [_1: map<int,int>, _2: int] scala> val rdd = sc.parallelize(1 to 10).map(x => Map(x -> 0)) rdd: org.apache.spark.rdd.RDD[scala.collection.immutable.Map[Int,Int]] = MapPartitionsRDD[23] at map at <console>:27 scala> rdd.toDF <console>:30: error: value toDF is not a member of org.apache.spark.rdd.RDD[scala.collection.immutable.Map[Int,Int]] rdd.toDF 

So, what exactly is happening here, toDF can convert an RDD of type (scala.collection.immutable.Map[Int,Int], Int) to a DataFrame, but not of type scala.collection.immutable.Map[Int,Int] . Why is this?

+6
source share
2 answers

For the same reason why you cannot use

 sqlContext.createDataFrame(1 to 10).map(x => Map(x -> 0)) 

If you look at the source of org.apache.spark.sql.SQLContext , you will find two different implementations of the createDataFrame method:

 def createDataFrame[A <: Product : TypeTag](rdd: RDD[A]): DataFrame 

and

 def createDataFrame[A <: Product : TypeTag](data: Seq[A]): DataFrame 

As you can see, both require A be a subclass of Product . When you call toDF on RDD[(Map[Int,Int], Int)] , this works because Tuple2 really a Product . Map[Int,Int] is not an error in itself.

You can get it working by wrapping Map with Tuple1 :

 sc.parallelize(1 to 10).map(x => Tuple1(Map(x -> 0))).toDF 
+9
source

Mostly because there is no implicit to create a DataFrame for the map inside the RDD.

In the first example, you return a Tuple, which is a Product for which an implicit conversion exists.

rddToDataFrameHolder [A <: Product: TypeTag] (rdd: RDD [A])

In the second example you are using, there is a map in your RDD for which there is no implicit conversion.

+5
source

All Articles