There are known problems with the default memory configurations in clusters where the type of the host machine is different from the type of the working driver, although in your case this is not the main problem.
When you saw the following:
spark.executor.cores 4 spark.executor.memory 9310m
this actually means that each working node will run 2 executors, and each executor will use 4 kernels, so that all 8 kernels will really be used for each worker. Thus, if we provide AppMaster with half of one machine, AppMaster can be successfully packaged next to the contractor.
The amount of memory provided by NodeManager should leave some overhead for the NodeManager daemon itself and so on. other daemon services, such as DataNode, so ~ 80% is left for NodeManagers. In addition, the distributions must be a multiple of the minimum YARN distribution, so after flooring to the nearest multiple distribution, where 22528 MB comes from n1-standard-8.
If you add workers who have more than 60 GB of RAM, then while you are using the node wizard of the same memory size, you should see a higher maximum threshold number.
In any case, if you are faced with OOM problems, then this is not so much memory for each artist as it is important, but rather memory for each task. And if you increase spark.executor.cores at the same time as spark.executor.memory , then the memory for each task does not actually increase, so in this case you will no longer allocate your application logic; Spark will use spark.executor.cores to determine the number of simultaneous tasks to work in the same memory space.
To get more memory for one task, you should basically try:
- Use machine types n1-highmem- *
- Try reducing spark.executor.cores, leaving spark.executor.memory the same
- Try increasing the value of spark.executor.memory, leaving spark.executor.cores the same
If you do (2) or (3) above, you will really leave the kernels inactive compared to the default configuration, which tries to occupy all the kernels, but this is really the only way to get more memory in one task, except by going to highmem instances.