I am new to spark / zeppelin and I wanted to do a simple exercise where I convert the csv file from pandas to a Spark data frame and then register the table to query it with sql and render it with Zeppelin.
But I seem to fail in the last step.
I am using Spark 1.6.1
Here is my code:
%pyspark spark_clean_df.registerTempTable("table1") print spark_clean_df.dtypes print sqlContext.sql("select count(*) from table1").collect()
Here is the result:
[('id', 'bigint'), ('name', 'string'), ('host_id', 'bigint'), ('host_name', 'string'), ('neighbourhood', 'string'), ('latitude', 'double'), ('longitude', 'double'), ('room_type', 'string'), ('price', 'bigint'), ('minimum_nights', 'bigint'), ('number_of_reviews', 'bigint'), ('last_review', 'string'), ('reviews_per_month', 'double'), ('calculated_host_listings_count', 'bigint'), ('availability_365', 'bigint')] [Row(_c0=4961)]
But when I try to use% sql, I get this error:
%sql select * from table1 Table not found: table1; line 1 pos 14 set zeppelin.spark.sql.stacktrace = true to see full stacktrace
Any help would be appreciated - I donβt even know where to find this glass and how it can help me.
Thanks:)
apache-spark pyspark apache-spark-sql apache-zeppelin
StefanK
source share