What happens if SparkSession is not closed?

What is the difference between the following 2?

object Example1 { def main(args: Array[String]): Unit = { try { val spark = SparkSession.builder.getOrCreate // spark code here } finally { spark.close } } } object Example2 { val spark = SparkSession.builder.getOrCreate def main(args: Array[String]): Unit = { // spark code here } } 

I know that SparkSession implements Closeable, and it hints that it needs to be closed. However, I can’t think of any problems if SparkSession is created only as in example2 and never closes directly. If the Spark application fails (and exits the main method), the JVM will terminate and SparkSession will disappear with it. It's right? IMO: The fact that SparkSession is a single shouldn't matter much either.

+5
source share
1 answer

You should always close SparkSession when you are done using it (even if the end result was to follow the good practice of giving back what was given to you).

Closing SparkSession can cause the release of cluster resources that can be transferred to another application.

SparkSession is a session and as such supports some resources that consume JVM memory. You can have as many SparkSessions as you want (see SparkSession.newSession to re-create the session), but you do not want them to use the memory they should not, unless you use one and therefore close one you no longer need .

SparkSession is a Spark SQL wrapper around the Spark Core SparkContext and therefore under the covers (as in any Spark application) d have cluster resources, i.e. vcores and memory assigned to your SparkSession (via SparkContext ). This means that as long as your SparkContext used (using SparkSession ), cluster resources will not be assigned to other tasks (not necessarily Spark, but also for other non-Spark applications sent to the cluster). These cluster resources belong to you until you say "I am done," which translates to ... close .

If, however, after close you simply exit the Spark application, you do not need to think about executing close , as resources will be automatically closed. The JVMs for the driver and executors are terminated, as well as the connection (heartbeat) with the cluster, and therefore, resources are ultimately returned to the cluster manager, so he may suggest that they be used by some other application.

+1
source

All Articles