How to pass multiple statements to Spark SQL HiveContext

For example, I have several Hive HQL statements that I want to pass to Spark SQL:

set parquet.compression=SNAPPY;
create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE;
select * from MY_TABLE limit 5;

The following does not work:

hiveContext.sql("set parquet.compression=SNAPPY; create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE; select * from MY_TABLE limit 5;")

How to pass instructions in Spark SQL?

+4
source share
2 answers

Thanks @SamsonScharfrichter for the answer.

This will work:

hiveContext.sql("set spark.sql.parquet.compression.codec=SNAPPY")
hiveContext.sql("create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE")
val rs = hiveContext.sql("select * from MY_TABLE limit 5")

Note that in this particular case, instead of parquet.compression, we need to use spark.sql.parquet.compression.codec

+2
source

I worked on a script where I needed to read the sql file and run everything; split requests present in this file.

One easy way to do this:

val hsc = new org.apache.spark.sql.hive.HiveContext(sc)
val sql_file = "/hdfs/path/to/file.sql"
val file = sc.wholeTextFiles(s"$sql_file")
val queries = f.take(1)(0)._2
Predef.refArrayOps(queries.split(';')).map(query => hsc.sql(query))
0

All Articles