Spark 2.0: overrides SparkSession parameters through GetOrCreate and DOES NOT see changes in WebUI

I am using Spark 2.0 with PySpark.

I override the SparkSession parameters with the GetOrCreate method that was introduced in 2.0:

This method first checks to see if a valid global SparkSession exists by default, and if so, returns it. If there is no valid global SparkSession by default, the method creates a new SparkSession and sets the newly created SparkSession as the global default.

If an existing SparkSession is returned, the configuration parameters specified in this linker will be applied to the existing SparkSession.

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.getOrCreate

So far, so good:

 from pyspark import SparkConf SparkConf().toDebugString() 'spark.app.name=pyspark-shell\nspark.master=local[2]\nspark.submit.deployMode=client' spark.conf.get("spark.app.name") 'pyspark-shell' 

Then I redefine the SparkSession configuration with a promise to see the changes in WebUI

APPNAME (name)
Sets the name for the application that will be displayed in the Spark web interface.

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.appName

 c = SparkConf() (c .setAppName("MyApp") .setMaster("local") .set("spark.driver.memory","1g") ) from pyspark.sql import SparkSession (SparkSession .builder .enableHiveSupport() # metastore, serdes, Hive udf .config(conf=c) .getOrCreate()) spark.conf.get("spark.app.name") 'MyApp' 

Now that I switch to localhost:4040 , I expect to see MyApp as the application name.

However, I still see the pyspark-shell application UI

Where am I wrong?

Thanks in advance!

+8
apache-spark pyspark apache-spark-sql pyspark-sql
source share
1 answer

I find the documentation a little misleading here, and when you work with Scala, you really see a warning like this:

 ... WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect. 

This was more obvious before Spark 2.0 with a clear separation between contexts:

  • Configuration
  • SparkContext cannot be changed at runtime. You must first stop the existing context. Configuration
  • SQLContext can be changed at runtime.

spark.app.name , like many other parameters, is bound to a SparkContext and cannot be changed without stopping the context.

Reusing an existing SparkContext / SparkSession

 import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession spark.conf.get("spark.sql.shuffle.partitions") 
 String = 200 
 val conf = new SparkConf() .setAppName("foo") .set("spark.sql.shuffle.partitions", "2001") val spark = SparkSession.builder.config(conf).getOrCreate() 
 ... WARN SparkSession$Builder: Use an existing SparkSession ... spark: org.apache.spark.sql.SparkSession = ... 
 spark.conf.get("spark.sql.shuffle.partitions") 
 String = 2001 

While spark.app.name config is being updated:

 spark.conf.get("spark.app.name") 
 String = foo 

it does not affect SparkContext :

 spark.sparkContext.appName 
 String = Spark shell 

Stop an existing SparkContext / SparkSession

Now stop the session and repeat the process:

 spark.stop val spark = SparkSession.builder.config(conf).getOrCreate() 
 ... WARN SparkContext: Use an existing SparkContext ... spark: org.apache.spark.sql.SparkSession = ... 
 spark.sparkContext.appName 
 String = foo 

Interestingly, when we stop a session, we still get a warning about using an existing SparkContext , but you can verify that it is actually stopped.

+9
source share

All Articles