SparkContext share between Java and R applications under the same wizard

So here is the setup.

I currently have two Spark Applications initializations. I need to pass data between them (preferably through the common text sparkcontext / sqlcontext so that I can just query the temporary table). I am currently using Parquet Files to transfer data, but is this possible in any other way?

MasterURL points to the same SparkMaster

Launch Spark through the terminal:

/opt/spark/sbin/start-master.sh; /opt/spark/sbin/start-slave.sh spark://`hostname`:7077 

Setting up a Java application:

 JavaSparkContext context = new JavaSparkContext(conf); //conf = setMaster(MasterURL), 6G memory, and 4 cores. SQLContext sqlContext = new SQLContext(parentContext.sc()); 

Then I register the existing frame later

 //existing dataframe to temptable df.registerTempTable("table"); 

and

Sparkark

 sc <- sparkR.init(master='MasterURL', sparkEnvir=list(spark.executor.memory='6G', spark.cores.max='4') sqlContext <- sparkRSQL.init(sc) # attempt to get temptable df <- sql(sqlContext, "SELECT * FROM table"); # throws the error 
+3
java r apache-spark spark-dataframe
source share
1 answer

As far as I know, this is not possible given the current configuration. Tables created using registerTempTable are bound to the specific SQLContext that was used to create the corresponding DataFrame . Even if your Java and SparkR applications use the same wizard, their drivers run on separate JVMs and cannot use the same SQLContext .

There are tools like Apache Zeppelin that use a different approach with one SQLContext (and SparkContext ) that is exposed to separate backends. Thus, you can register a table using, for example, Scala, and read it with Python. There is a Zeppelin plug that provides some support for SparkR and R. You can check how it launches and interacts with the R backend .

+3
source share

All Articles