Moving Spark DataFrame from Python to Scala whithn Zeppelin

Question

I created a spark DataFrame in a Python paragraph in Zeppelin.

sqlCtx = SQLContext(sc) spDf = sqlCtx.createDataFrame(df)

and df is pandas dataframe

 print(type(df)) <class 'pandas.core.frame.DataFrame'>

what I want to do is move spDf from one paragraph of Python to another paragraph of Scala. This is a smart way to do this with z.put .

 z.put("spDf", spDf)

and I got this error:

 AttributeError: 'DataFrame' object has no attribute '_get_object_id'

any suggestion to fix the error? or any suggestion about moving spDf?

+4

python scala apache-spark spark-dataframe apache-zeppelin

MTT May 16, '16 at 21:09

source share

1 answer

zero323 · Accepted Answer · 2016-05-16T21:17:09+0000

You can put internal Java object not to wrap Python:

 %pyspark df = sc.parallelize([(1, "foo"), (2, "bar")]).toDF(["k", "v"]) z.put("df", df._jdf)

and then make sure you use the correct type:

 val df = z.get("df").asInstanceOf[org.apache.spark.sql.DataFrame] // df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

but it is better to register a temporary table:

 %pyspark # registerTempTable in Spark 1.x df.createTempView("df")

and use SQLContext.table to read it:

 // sqlContext.table in Spark 1.x val df = spark.table("df")

 df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

To convert backwards see Zeppelin: Scala Dataframe to python