How to set the exact maximum number of simultaneously running tasks on a node in Hadoop 2.4.0 on Elastic MapReduce

According to http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/ , the formula for determining the number of simultaneously running tasks on a node:

min (yarn.nodemanager.resource.memory-mb / mapreduce.[map|reduce].memory.mb, yarn.nodemanager.resource.cpu-vcores / mapreduce.[map|reduce].cpu.vcores) . 

However, when setting these parameters (for a cluster from c3.2xlarges):

yarn.nodemanager.resource.memory-mb = 14336

mapreduce.map.memory.mb = 2048

yarn.nodemanager.resource.cpu-vcores = 8

mapreduce.map.cpu.vcores = 1,

I find that I get only up to 4 tasks running at the same time in node, when formula 7 should be indicated. What is the deal?

I am running Hadoop 2.4.0 on AMI 3.1.0.

+8
amazon-web-services elastic-map-reduce yarn hadoop2 hadoop-streaming
source share
1 answer

My empirical formula was wrong. The formula provided by Cloudera is correct and appears to give the expected number of simultaneous tasks, at least on AMI 3.3.1.

+1
source share

All Articles