Spark does not use all configured memory

Launching a spark offline client on a cluster of 10 nodes using Spark-2.1.0-SNAPSHOT.
9 nodes - workers, 10 - master and driver. Each 256 GB of memory. I have difficulty fully utilizing my cluster.

I set the memory limit for artists and drivers to 200 GB using the following settings for the spark shell:

spark-shell --executor-memory 200g --driver-memory 200g --conf spark.driver.maxResultSize=200g 

When my application starts, I see that these values ​​are set as expected both on the console and on the user interface tab /environment/ . But when I go to the /executors/ tab, I see that my nodes only received 114.3 GB of memory, see the screen below.

enter image description here

The total memory shown here is then 1.1 TB, while I expect 2 TB. I double-checked that other processes are not using memory.
Any idea what is the source of this inconsistency? Am I missing some settings? Is this a bug in the /executors/ tab or spark engine?

+6
source share
1 answer

You are fully using memory, but here you are looking only at a fraction of the memory. By default, part of the storage is 60% of the total memory.

From Spark Reports

Spark's memory usage largely falls into one of two categories: execution and storage. Execution memory refers to the one used for randomly calculating, joining, sorting, and aggregating, and memory refers to what is used to cache and distribute internal data across the cluster.

As with Spark 1.6, RAM and storage are shared, so it is unlikely that you will need to configure memory.fraction.

If you use yarn, the main page of the “Memory Used” and “Total Memory” resource manager will indicate total memory usage.

enter image description here

+4
source

All Articles