Memory consumption haop-namenode?

Question

Memory consumption haop-namenode?

Can someone give a detailed analysis of namenode memory consumption? Or is there some kind of reference material? You cannot find material on the net. Thanks!

+7

hadoop memory-consumption

jun zhou Nov 09 '12 at 9:03

source share

4 answers

Pitt · Answer 1 · 2012-11-09T09:29:11+0000

I believe that memory consumption will depend on your HDFS installation, so it depends on the overall size of the HDFS and the block size. From the Hadoop NameNode wiki :

Use a good server with lots of RAM. The more RAM you have, the larger the file system or the block size is smaller.

From https://twiki.opensciencegrid.org/bin/view/Documentation/HadoopUnderstanding :

Namenode: The main Hadoop metadata server. This is the most important part of the system, and there can only be one of them. This saves both the file system image and the file system log. Nameda saves all the information about the location of the file system (files, blocks, directories, permissions, etc.) and the location of the blocks. The layout of the file system is stored on disk, and the location of the blocks is stored exclusively in memory. When the client opens the file, namenode tells the client the location of all the blocks in the file; the client then no longer needs to contact namenode to transfer data.

the same site recommends the following:

Namenode: we recommend at least 8 GB of RAM (minimum 2 GB of RAM), preferably 16 GB or more. A rough rule of thumb is 1 GB per 100 TB of free disk space; actual requirements are about 1 GB per million objects (files, directories and blocks). Processor Requirements - This is any modern multi-core server processor. As a rule, namenode will use only 2-5% of your processor. Since this is the only point of failure, the most important requirement is reliable equipment, not high-performance equipment. We offer a node with redundant power supplies and at least 2 hard drives.

For a more detailed analysis of memory usage check this link: https://issues.apache.org/jira/browse/HADOOP-1687

You may also find this question interesting: Using nadenode Hadoop memory

David gruzman · Answer 2 · 2012-11-09T20:10:04+0000

There are several technical limitations to NameNode (NN), and any of them limits your scalability.

Memory. NN consumes about 150 bytes per block. From here you can calculate how much RAM you need for your data. There is a good discussion: Limiting the number of Namenode files .
IO. NN makes 1 IO for each file system change (e.g. create, delete block, etc.). Therefore, your local IO should allow enough. It’s harder to estimate how much you need. Taking into account the fact that we are limited by the number of blocks from memory, you will not require this limit if your cluster is not very large. If it is - consider an SSD.
CPU. Namenode has a significant load, tracking the health of all blocks on all datanodes. Each datanode once during a period of time reports the state of its entire block. Again, if the cluster is not too large, this should not be a problem.

user166555 · Answer 3 · 2016-09-15T13:11:55+0000

Calculation example 200 node cluster 24 TB / node Block size 128 MB Replication rate = 3

How much space is required?

blocks = 200 * 24 * 2 ^ 20 / (128 * 3)

~ 12 million blocks ~ 12,000 MB of memory.

ᐅ devrimbaris · Answer 4 · 2016-12-18T18:21:03+0000

I think we should make a distinction between how the naming memory is consumed by each nomenod object and general recommendations for determining the size of the nomenodyai heap.

For the first case (consumption), AFAIK, each namenode object contains an average of 150 bytes of memory. Namenode objects are files, blocks (not counting replicated copies), and directories. So, for a file with 3 blocks, this is 4 (1 file and 3 blocks) x150 bytes = 600 bytes.

In the second case, the recommended heap size for the stock list is usually recommended to reserve 1 GB per 1 million blocks. If you calculate this (150 bytes per block), you get 150 MB of memory consumption. You can see that this is much less than 1 GB per 1 million blocks, but you should also consider the number of files and directories.

I think this is a safe recommendation. Check out the following two links for a more general discussion and examples:

NameNode Heap Memory - Cloudera

Set Name Heap Size Node - Hortonworks

Namenode internal memory structure structures

Memory consumption haop-namenode?

blocks = 200 * 24 * 2 ^ 20 / (128 * 3)

More articles: