You can use org.apache.spark.sql.hive.HiveContext to execute an SQL query on Hive tables.
You can also connect the spark to the HDFS base directory, where the data is actually stored. This will be more efficient since the SQL query does not need parsing or the schema applied to the files.
If the cluster is external, you need to install hive.metastore.uris
source share