So here is the setup.
I currently have two Spark Applications initializations. I need to pass data between them (preferably through the common text sparkcontext / sqlcontext so that I can just query the temporary table). I am currently using Parquet Files to transfer data, but is this possible in any other way?
MasterURL points to the same SparkMaster
Launch Spark through the terminal:
/opt/spark/sbin/start-master.sh; /opt/spark/sbin/start-slave.sh spark://`hostname`:7077
Setting up a Java application:
JavaSparkContext context = new JavaSparkContext(conf); //conf = setMaster(MasterURL), 6G memory, and 4 cores. SQLContext sqlContext = new SQLContext(parentContext.sc());
Then I register the existing frame later
//existing dataframe to temptable df.registerTempTable("table");
and
Sparkark
sc <- sparkR.init(master='MasterURL', sparkEnvir=list(spark.executor.memory='6G', spark.cores.max='4') sqlContext <- sparkRSQL.init(sc)
java r apache-spark spark-dataframe
DeeVu
source share