Connecting the Apache spark to the apache hive remotely.

I can download data from the hive server in the same cluster as where the apache spark is installed. But how can I load data into a dataframe from a remote hive server. Is the only jvbc connector for the hive?

any suggestion how can i do this?

+3
source share
1 answer

You can use org.apache.spark.sql.hive.HiveContext to execute an SQL query on Hive tables.

You can also connect the spark to the HDFS base directory, where the data is actually stored. This will be more efficient since the SQL query does not need parsing or the schema applied to the files.

If the cluster is external, you need to install hive.metastore.uris

+6
source

All Articles