Zeppelin - Cannot query with% sql the table I registered with pyspark

I am new to spark / zeppelin and I wanted to do a simple exercise where I convert the csv file from pandas to a Spark data frame and then register the table to query it with sql and render it with Zeppelin.

But I seem to fail in the last step.

I am using Spark 1.6.1

Here is my code:

%pyspark spark_clean_df.registerTempTable("table1") print spark_clean_df.dtypes print sqlContext.sql("select count(*) from table1").collect() 

Here is the result:

 [('id', 'bigint'), ('name', 'string'), ('host_id', 'bigint'), ('host_name', 'string'), ('neighbourhood', 'string'), ('latitude', 'double'), ('longitude', 'double'), ('room_type', 'string'), ('price', 'bigint'), ('minimum_nights', 'bigint'), ('number_of_reviews', 'bigint'), ('last_review', 'string'), ('reviews_per_month', 'double'), ('calculated_host_listings_count', 'bigint'), ('availability_365', 'bigint')] [Row(_c0=4961)] 

But when I try to use% sql, I get this error:

 %sql select * from table1 Table not found: table1; line 1 pos 14 set zeppelin.spark.sql.stacktrace = true to see full stacktrace 

Any help would be appreciated - I don’t even know where to find this glass and how it can help me.

Thanks:)

+7
apache-spark pyspark apache-spark-sql apache-zeppelin
source share
4 answers

Zeppelin can create different contexts for different interpreters, it is possible that if you executed some code with% spark and some code with the% pyspark interpreters, your Zeppelin can have two contexts. And when you use% sql, it searches in a different context, not in% pyspark. Try restarting Zeppelin and executing the% pyspark code as the first statement and the second as%.

If you go to the Translators tab, you can add zeppelin.spark.sql.stacktrace there. And after restarting Zeppelin, you will see a full stack trace in the place where you no longer have the "Table".

Actually, this is probably the answer to your question. When registering a table using the% pyspark interpreter in Zeppelin, I cannot access the table in% sql

Try to do

  %pyspark sqlContext = sqlc 

like the first two lines

+3
source share

also associated with various contexts created by the spark , checks the following parameters in the spark interpreter

 zeppelin.spark.useHiveContext = false 

set the parameter to false

+1
source share

You did not say which group of translators you used. If it is livy , you will not be able to access the tables registered in %livy.pyspark from %livy.sql . I got this from here :

 for now %livy.sql can only access tables registered %livy.spark, but not %livy.pyspark and %livy.sparkr. 

If you switch to the standard spark interpreter group, it should work. I can confirm this for me using Spark 1.6.3 and Zeppelin 0.7.0. I hope that people working on a personal translator will set this restriction ...

+1
source share

The correct syntax is:

 sqlContext.registerDataFrameAsTable(spark_clean_df, 'table1') sqlContext.sql(select * from table1 where ...) 
0
source share

All Articles