This is not a mistake, but sorry for the confusion! Prior to Spark 1.3, Spark SQL was marked as an alpha component because the APIs were still in the stream. With Spark 1.3, we have finished and stabilized the API. A full description of what you need to do when porting can be found in the documentation .
I can also answer your specific questions and provide some justification why we made these changes.
Work stopped in 1.3.0 (it just doesnβt compile, and everything I did was changed to 1.3) jsonResult.zipWithUniqueId() //since RDDApi doesn't implement that method
DataFrames is now one unified interface for both Scala and Java. However, since we must maintain compatibility with the existing RDD API for the rest of 1.X, DataFrames not RDD s. To get an RDD view, you can call df.rdd or df.javaRDD
Also, since we were afraid of some confusion that might occur with implicit conversions, we made it so that you explicitly call rdd.toDF to invoke the conversion from RDD. However, this conversion only works automatically if your RDD contains objects that inherit from Product (for example, tuples or case classes).
Back to the original question, if you want to do conversions in strings with an arbitrary scheme, you need to explicitly tell Spark SQL about the data structure after your operation with the map (since the compiler cannot).
import org.apache.spark.sql.types._ val jsonData = sqlContext.jsonRDD(sc.parallelize("""{"name": "Michael", "zip": 94709}""" :: Nil)) val newSchema = StructType( StructField("uniqueId", IntegerType) +: jsonData.schema.fields) val augmentedRows = jsonData.rdd.zipWithUniqueId.map { case (row, id) => Row.fromSeq(id +: row.toSeq) } val newDF = sqlContext.createDataFrame(augmentedRows, newSchema)
source share