Spark 2.0: overrides SparkSession parameters through GetOrCreate and DOES NOT see changes in WebUI

Question

Spark 2.0: overrides SparkSession parameters through GetOrCreate and DOES NOT see changes in WebUI

I am using Spark 2.0 with PySpark.

I override the SparkSession parameters with the GetOrCreate method that was introduced in 2.0:

This method first checks to see if a valid global SparkSession exists by default, and if so, returns it. If there is no valid global SparkSession by default, the method creates a new SparkSession and sets the newly created SparkSession as the global default.
If an existing SparkSession is returned, the configuration parameters specified in this linker will be applied to the existing SparkSession.

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.getOrCreate

So far, so good:

 from pyspark import SparkConf SparkConf().toDebugString() 'spark.app.name=pyspark-shell\nspark.master=local[2]\nspark.submit.deployMode=client' spark.conf.get("spark.app.name") 'pyspark-shell'

Then I redefine the SparkSession configuration with a promise to see the changes in WebUI

APPNAME (name)
Sets the name for the application that will be displayed in the Spark web interface.

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.appName

 c = SparkConf() (c .setAppName("MyApp") .setMaster("local") .set("spark.driver.memory","1g") ) from pyspark.sql import SparkSession (SparkSession .builder .enableHiveSupport() # metastore, serdes, Hive udf .config(conf=c) .getOrCreate()) spark.conf.get("spark.app.name") 'MyApp'

Now that I switch to localhost:4040 , I expect to see MyApp as the application name.

However, I still see the pyspark-shell application UI

Where am I wrong?

Thanks in advance!

+8

apache-spark pyspark apache-spark-sql pyspark-sql

Sergey Bushmanov Nov 20 '16 at 7:00

source share

1 answer

user6910411 · Accepted Answer · 2016-11-20T08:12:15+0000

I find the documentation a little misleading here, and when you work with Scala, you really see a warning like this:

 ... WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect.

This was more obvious before Spark 2.0 with a clear separation between contexts:

Configuration
SparkContext cannot be changed at runtime. You must first stop the existing context. Configuration
SQLContext can be changed at runtime.

spark.app.name , like many other parameters, is bound to a SparkContext and cannot be changed without stopping the context.

Reusing an existing SparkContext / SparkSession

 import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession spark.conf.get("spark.sql.shuffle.partitions")

 String = 200

 val conf = new SparkConf() .setAppName("foo") .set("spark.sql.shuffle.partitions", "2001") val spark = SparkSession.builder.config(conf).getOrCreate()

 ... WARN SparkSession$Builder: Use an existing SparkSession ... spark: org.apache.spark.sql.SparkSession = ...

 spark.conf.get("spark.sql.shuffle.partitions")

 String = 2001

While spark.app.name config is being updated:

 spark.conf.get("spark.app.name")

 String = foo

it does not affect SparkContext :

 spark.sparkContext.appName

 String = Spark shell

Stop an existing SparkContext / SparkSession

Now stop the session and repeat the process:

 spark.stop val spark = SparkSession.builder.config(conf).getOrCreate()

 ... WARN SparkContext: Use an existing SparkContext ... spark: org.apache.spark.sql.SparkSession = ...

 spark.sparkContext.appName

 String = foo

Interestingly, when we stop a session, we still get a warning about using an existing SparkContext , but you can verify that it is actually stopped.

Spark 2.0: overrides SparkSession parameters through GetOrCreate and DOES NOT see changes in WebUI

More articles: