Spark shows a different number of cores than what is passed to it using spark-submit

TL DR

Spark UI shows a different number of cores and memory than what I ask for when using spark-submit

more details:

I am running Spark 1.6 offline. When I run spark-submit, I give it 1 instance of the executor with 1 core for the executor, as well as 1 core for the driver. I would expect my application to work with a total of 2 cores. When I check the environment tab in the user interface, I see that he got the correct parameters that I gave them, however it still seems like he is using a different number of cores. You can see it here:

enter image description here

This is my spark-defaults.conf that I use:

spark.executor.memory 5g spark.executor.cores 1 spark.executor.instances 1 spark.driver.cores 1 

Checking the environment bookmarks in the Spark interface shows that these are really accepted parameters, but the user interface still shows something else

Does anyone have any ideas on what could make Spark use a different number of cores than what I want to convey? I obviously tried using it, but did not find anything useful in this thread

Thank you in advance

+5
source share
1 answer

TL DR

Use spark.cores.max instead to determine the total number of available cores, and thereby limit the number of artists.


Offline mode uses a greedy strategy, and since many artists will be created, since there are kernels and memory on your desktop.

In your case, you specified 1 core and 5 GB of memory for each artist. The following will be calculated by Spark:

  • Since 8 cores are available, he will try to create 8 performers.
  • However, since only 30 GB of available memory is available, it can only create 6 artists: each artist will have 5 GB of memory, which will add up to 30 GB.
  • Thus, 6 performers will be created, and a total of 6 cores with 30 GB of memory will be used.

The spark basically did what you asked for. To achieve what you want, you can use the spark.cores.max option here and specify the exact number of cores you need.

A few notes:

  • spark.executor.instances is a configuration for YARN only
  • spark.driver.memory defaults to 1 core

I am also working on easing the concept of the number of performers offline, this can be integrated into the next Spark release and hopefully helps to figure out the exact number of performers you are going to have, without having to figure it out on the go.

+5
source

All Articles