Difference between Beehive Capture Server and Spark Distribution

What is the difference between starting a hive server using one of the following two commands: -

  • hive --service hiveserver2
  • Starting a hive lean server from spark/sbin$ ./start-thriftserver.sh

Do they listen to individual ports?

Which one should I use to establish a JDBC connection using the Apache Hive JDBC driver in my Java class?

+5
source share
2 answers

Hiveserver2 is a sql-hive processor that can use map abbreviation, spark or mes as a execution mechanism. Hive creates an execution plan, and then calls the execution engine to execute the request. Optimization is performed by the hive.

I am a strong spark user, but I wanted the hive to be available to run adhoc requests through shade. After some research, I see that hive 1.2.1 supports up to 1.4.1 as a execution mechanism. hive 2 has a spark 1.5 dependency, but I have not tried to run it with 1.5 or 1.6.

The spark throttling server can replace the hive 2 server and uses the spark to actually run the query and execute its own execution plan (which may or may not be better than the hive), but gives you access to other spark sources such as rdds, text files, etc. d. Of course, you can run a lean server with the latest spark.

+2
source

I think both do the same, except when you start the Hive Thrift server from a spark, it adds another CLI service to the lean server , which should add a spark SQL context to the save API.

+1
source

Source: https://habr.com/ru/post/1215532/


All Articles