Google Cloud Dataproc Configuration Issues

Question

Google Cloud Dataproc Configuration Issues

I encounter various problems in some Spark LDA simulation models (mainly disassembly errors at seemingly random intervals) that I performed, which, in my opinion, is mainly due to the insufficient memory allocation of my artists. It would seem that this is due to the problematic configuration of the automatic cluster. My last attempt uses n1-standard-8 machines (8 cores, 30 GB of RAM) for both the main and the working nodes (6 workers, which means 48 shared cores).

But when I look at /etc/spark/conf/spark-defaults.conf , I see this:

 spark.master yarn-client spark.eventLog.enabled true spark.eventLog.dir hdfs://cluster-3-m/user/spark/eventlog # Dynamic allocation on YARN spark.dynamicAllocation.enabled true spark.dynamicAllocation.minExecutors 1 spark.dynamicAllocation.initialExecutors 100000 spark.dynamicAllocation.maxExecutors 100000 spark.shuffle.service.enabled true spark.scheduler.minRegisteredResourcesRatio 0.0 spark.yarn.historyServer.address cluster-3-m:18080 spark.history.fs.logDirectory hdfs://cluster-3-m/user/spark/eventlog spark.executor.cores 4 spark.executor.memory 9310m spark.yarn.executor.memoryOverhead 930 # Overkill spark.yarn.am.memory 9310m spark.yarn.am.memoryOverhead 930 spark.driver.memory 7556m spark.driver.maxResultSize 3778m spark.akka.frameSize 512 # Add ALPN for Bigtable spark.driver.extraJavaOptions -Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.3.v20150130.jar spark.executor.extraJavaOptions -Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.3.v20150130.jar

But these values do not make much sense. Why use only 4/8 executive cores? And only 9.3 / 30GB RAM? My impression was that all of this configuration should have been processed automatically, but even my attempts at manual configuration did not take me anywhere.

For example, I tried to run the shell using

 spark-shell --conf spark.executor.cores=8 --conf spark.executor.memory=24g

But then it failed using

 java.lang.IllegalArgumentException: Required executor memory (24576+930 MB) is above the max threshold (22528 MB) of this cluster! Please increase the value of 'yarn.scheduler.maximum-allocation-mb'.

I tried changing the bound value in /etc/hadoop/conf/yarn-site.xml so as not to affect. Even when I try to perform another cluster setup (for example, using artists with RAM larger than 60 GB), I get the same problem. For some reason, the maximum threshold remains at 22528 MB.

Is there something I'm doing wrong here, or is this a problem with Google auto setup?

+6

apache-spark google-cloud-platform google-cloud-dataproc lda

mustachio Dec 7 '15 at 18:32

source share

1 answer

Dennis huo · Accepted Answer · 2015-12-07T19:34:54+0000

There are known problems with the default memory configurations in clusters where the type of the host machine is different from the type of the working driver, although in your case this is not the main problem.

When you saw the following:

 spark.executor.cores 4 spark.executor.memory 9310m

this actually means that each working node will run 2 executors, and each executor will use 4 kernels, so that all 8 kernels will really be used for each worker. Thus, if we provide AppMaster with half of one machine, AppMaster can be successfully packaged next to the contractor.

The amount of memory provided by NodeManager should leave some overhead for the NodeManager daemon itself and so on. other daemon services, such as DataNode, so ~ 80% is left for NodeManagers. In addition, the distributions must be a multiple of the minimum YARN distribution, so after flooring to the nearest multiple distribution, where 22528 MB comes from n1-standard-8.

If you add workers who have more than 60 GB of RAM, then while you are using the node wizard of the same memory size, you should see a higher maximum threshold number.

In any case, if you are faced with OOM problems, then this is not so much memory for each artist as it is important, but rather memory for each task. And if you increase spark.executor.cores at the same time as spark.executor.memory , then the memory for each task does not actually increase, so in this case you will no longer allocate your application logic; Spark will use spark.executor.cores to determine the number of simultaneous tasks to work in the same memory space.

To get more memory for one task, you should basically try:

Use machine types n1-highmem- *
Try reducing spark.executor.cores, leaving spark.executor.memory the same
Try increasing the value of spark.executor.memory, leaving spark.executor.cores the same

If you do (2) or (3) above, you will really leave the kernels inactive compared to the default configuration, which tries to occupy all the kernels, but this is really the only way to get more memory in one task, except by going to highmem instances.

Google Cloud Dataproc Configuration Issues

More articles: