Moving Spark DataFrame from Python to Scala whithn Zeppelin

I created a spark DataFrame in a Python paragraph in Zeppelin.

sqlCtx = SQLContext(sc) spDf = sqlCtx.createDataFrame(df) 

and df is pandas dataframe

 print(type(df)) <class 'pandas.core.frame.DataFrame'> 

what I want to do is move spDf from one paragraph of Python to another paragraph of Scala. This is a smart way to do this with z.put .

 z.put("spDf", spDf) 

and I got this error:

 AttributeError: 'DataFrame' object has no attribute '_get_object_id' 

any suggestion to fix the error? or any suggestion about moving spDf?

+4
python scala apache-spark spark-dataframe apache-zeppelin
source share
1 answer

You can put internal Java object not to wrap Python:

 %pyspark df = sc.parallelize([(1, "foo"), (2, "bar")]).toDF(["k", "v"]) z.put("df", df._jdf) 

and then make sure you use the correct type:

 val df = z.get("df").asInstanceOf[org.apache.spark.sql.DataFrame] // df: org.apache.spark.sql.DataFrame = [k: bigint, v: string] 

but it is better to register a temporary table:

 %pyspark # registerTempTable in Spark 1.x df.createTempView("df") 

and use SQLContext.table to read it:

 // sqlContext.table in Spark 1.x val df = spark.table("df") 
 df: org.apache.spark.sql.DataFrame = [k: bigint, v: string] 

To convert backwards see Zeppelin: Scala Dataframe to python

+6
source share

All Articles