What is the maximum number of files allowed in the HDFS directory?

What is the maximum number of files and directories allowed in the HDFS directory (hadoop)?

+7
source share
5 answers

Blocks and files are stored in HashMap. So you are tied to Integer.MAX_VALUE. Thus, the directory has no limitations, but the entire FileSystem.

+4
source

In modern versions of Apache Hadoop, various HDFS restrictions are controlled by configuration parameters with fs-limits in the name, all of which have reasonable defaults. This question specifically asked about the number of children in the catalog. This is determined by dfs.namenode.fs-limits.max-directory-items , and its default value is 1048576 .

Refer to the Apache Hadoop documentation in hdfs-default.xml for a complete list of fs-limits configuration properties and their default values. Copy here for convenience:

 <property> <name>dfs.namenode.fs-limits.max-component-length</name> <value>255</value> <description>Defines the maximum number of bytes in UTF-8 encoding in each component of a path. A value of 0 will disable the check.</description> </property> <property> <name>dfs.namenode.fs-limits.max-directory-items</name> <value>1048576</value> <description>Defines the maximum number of items that a directory may contain. Cannot set the property to a value less than 1 or more than 6400000.</description> </property> <property> <name>dfs.namenode.fs-limits.min-block-size</name> <value>1048576</value> <description>Minimum block size in bytes, enforced by the Namenode at create time. This prevents the accidental creation of files with tiny block sizes (and thus many blocks), which can degrade performance.</description> </property> <property> <name>dfs.namenode.fs-limits.max-blocks-per-file</name> <value>1048576</value> <description>Maximum number of blocks per file, enforced by the Namenode on write. This prevents the creation of extremely large files which can degrade performance.</description> </property> <property> <name>dfs.namenode.fs-limits.max-xattrs-per-inode</name> <value>32</value> <description> Maximum number of extended attributes per inode. </description> </property> <property> <name>dfs.namenode.fs-limits.max-xattr-size</name> <value>16384</value> <description> The maximum combined size of the name and value of an extended attribute in bytes. It should be larger than 0, and less than or equal to maximum size hard limit which is 32768. </description> </property> 

All of these options use reasonable defaults as defined by the Apache Hadoop community. It is generally recommended that users do not configure these values, except in unusual circumstances.

+8
source

From http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ :

Each file, directory and block in HDFS is represented as an object in the namenodes memory, each of which takes 150 bytes, as a rule. Thus, 10 million files, each of which uses a block, will use about 3 gigabytes of memory. Scaling far above this level is a problem with current hardware. Of course, a billion files are impossible.

+6
source

HDFS is specifically mentioned in this question, but the related question is how many files you can store in a Hadoop cluster.

You have a different answer if you use the MapR file system. In this case, billions of files can be stored in the cluster without problems.

+1
source

in HDFS, the maximum file name length is 255 bytes. therefore, a statement about a single file object takes only 150 bytes, incorrect or accurate. when calculating bytes for memory, we must take the maximum occupation of one object.

0
source

All Articles