In Java, I was able to set the data framework as temporary tables and read the contents of the table through beeline (like a regular beehive table)
I did not host the entire program (with the assumption that you already know how to create dataframes)
import org.apache.spark.sql.hive.thriftserver.*; HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc()); DataFrame orgDf = sqlContext.createDataFrame(orgPairRdd.values(), OrgMaster.class);
orgPairRdd is JavaPairRDD, orgPairRdd.values () -> contains the entire class value (a string from Hbase)
OrgMaster is a serializable java bean class
orgDf.registerTempTable("spark_org_master_table"); HiveThriftServer2.startWithContext(sqlContext);
I sent the program locally (since the Hive server thrifty server does not work in port 10000 on this computer)
hadoop_classpath=$(hadoop classpath) HBASE_CLASSPATH=$(hbase classpath) spark-1.5.2/bin/spark-submit --name tempSparkTable --class packageName.SparkCreateOrgMasterTableFile --master local[4] --num-executors 4 --executor-cores 4 --executor-memory 8G --conf "spark.executor.extraClassPath=${HBASE_CLASSPATH}" --conf "spark.driver.extraClassPath=${HBASE_CLASSPATH}" --conf "spark.executor.extraClassPath=${hadoop_classpath}" --conf --jars /path/programName-SNAPSHOT-jar-with-dependencies.jar /path/programName-SNAPSHOT.jar
On another launch of the terminal, beeline pointing to this support service started using this spark program
/opt/hive/hive-1.2/bin/beeline -u jdbc:hive2://<ipaddressofMachineWhereSparkPgmRunninglocally>:10000 -n anyUsername
Show tables → the command will display the table registered in Spark
You can also describe
In this example
describe spark_org_master_table;
then you can run regular queries in beeline against this table. (Until you kill the execution of the spark program)
source share