How does YARN decide to create the number of containers? (Why is the difference between S3a and HDFS?)

I use the current version of Hadoop and run some TestDFSIO tests (version 1.8) to compare cases when the default HDFS file system is an S3 bucket compared to the default file system (used via S3a ).

When reading 100 x 1 MB files with a default file system like S3a, I observe that the number of maximum containers in YARN Web UI is less than with HDFS by default, and S3a is about 4 times slower .

When reading 1000 x 10KB files with the default file system as S3a, I observe that the number of maximum containers in YARN Web UI is at least 10 times less than for HDFS by default, and S3a is about 16 times slower . (For example, 50 seconds of test execution time with HDFS by default, compared to 16 minutes of test execution time with S3a by default.)

The number of running map tasks, as expected, in each case, this makes no difference. But why is YARN creating at least 10 times less number of containers (for example, 117 on HDFS versus 8 on S3a)? How YARN decides to create how many containers, when the vcores cluster, RAM and input are consumed, and the running map tasks are the same ; and only for external storage ?

Of course, you can expect a difference in performance between HDFS and Amazon S3 (via S3a) when performing the same TestDFSIO tasks, after which I understand how YARN decides on the number of maximum containers that it starts during those tasks in which it changes only the default file system, because currently it looks like when the default file system is S3a, YARN uses almost no 90% parallelism (which usually does when the default file system is HDFS).

The cluster is a 15-w630> cluster with 1 NameNode, 1 ResourceManager (YARN) and 13 DataNodes (work nodes). Each node has 128 GB of RAM and a 48-core processor. This is a dedicated test cluster: during test runs of TestDFSIO, nothing else is executed in the cluster.

For HDFS, dfs.blocksize is 256m and it uses 4 hard drives ( dfs.datanode.data.dir set to file:///mnt/hadoopData1,file:///mnt/hadoopData2,file:///mnt/hadoopData3,file:///mnt/hadoopData4 ).

S3a, fs.s3a.block.size set to 268435456 , i.e. 256 m, the same as the default HDFS block size.

Hadoop the tmp directory is on the SSD (by setting hadoop.tmp.dir in /mnt/ssd1/tmp in core-site.xml , and by setting mapreduce.cluster.local.dir in /mnt/ssd1/mapred/local in mapred-site.xml )

The performance difference (HDFS by default, set to S3a by default) is shown below:

 TestDFSIO v. 1.8 (READ) fs.default.name # of Files x Size of File Launched Map Tasks Max # of containers observed in YARN Web UI Test exec time sec ============================= ========================= ================== =========================================== ================== hdfs://hadoop1:9000 100 x 1 MB 100 117 19 hdfs://hadoop1:9000 1000 x 10 KB 1000 117 56 s3a://emre-hadoop-test-bucket 100 x 1 MB 100 60 78 s3a://emre-hadoop-test-bucket 1000 x 10 KB 1000 8 1012 

100 x 1 MB - default FS HDFS default FS S3a

1000 x 10 KB - default FS HDFS default FS S3a

+7
amazon-s3 hadoop yarn
source share
1 answer

In short, one of the important criteria YARN uses to determine how many containers to create is based on location data . When using a file system other than HDFS, such as S3a, to connect to Amazon S3 or another S3-compatible object storage, it is the responsibility of the file system to provide information about the location of the data, since in this case none of the data is local to the node, each node must retrieve data from the network or, on the other hand, each node has the same locality of data.

The previous paragraph explains the behavior of the container that I observed while doing Hadoop MapReduce jobs against Amazon S3 using the S3a file system. To fix the problem, I started working on the patch, and development will be tracked through HADOOP-12878 .

Also see the following:

+1
source share

All Articles