Connecting the Apache spark to the apache hive remotely.

Question

Connecting the Apache spark to the apache hive remotely.

I can download data from the hive server in the same cluster as where the apache spark is installed. But how can I load data into a dataframe from a remote hive server. Is the only jvbc connector for the hive?

any suggestion how can i do this?

+3

jdbc hive apache-spark apache-spark-sql

user3313379 Oct 15 '15 at 8:34

source share

1 answer

axlpado - Agile Lab · Accepted Answer · 2015-10-15T09:59:13+0000

You can use org.apache.spark.sql.hive.HiveContext to execute an SQL query on Hive tables.

You can also connect the spark to the HDFS base directory, where the data is actually stored. This will be more efficient since the SQL query does not need parsing or the schema applied to the files.

If the cluster is external, you need to install hive.metastore.uris

Connecting the Apache spark to the apache hive remotely.

More articles: