How Block Pools Function in the HDFS Federation

Question

How Block Pools Function in the HDFS Federation

So, I read the Hadoop: The Definitive Guide. The sentence on this page was what confused me. So I created an image depicting each sentence.

The sentence says:

In federation, each namenode manages a namespace volume (black squares represent the namespace volume), which consists of metadata for the namespace and a block pool (shown in dark gray rectangle) containing all the blocks for files in the namespace. The namespace volumes are independent of each other (in the image they are individual for each node name shared by nothing), which means that the name-changers do not interact with each other, and in addition, the failure of one namespace does not affect the availability of namespaces managed by other namenodes. The storage of the pool of blocks is not divided into sections (and, therefore, divided between everyone in the image), therefore datanodes are registered with each namenode in the cluster (again shared with all namenodes) and store blocks from several block pools. (My question is: how do we have several block pools? Doesn't the whole paragraph summarize that all name nodes have metadata pointing to each block and therefore share the block pool?).

I'm damn confused!

+4

hadoop hdfs

aa8y Jan 22 '13 at 14:01

source share

3 answers

Just for clarity, if NameNode is NN-n in the diagram above, Pool-n will also be unavailable. Thus, the datanode blocks supported in Pool-n will not be available until Namenone NN-n is restored. OR it happens differently

+1

Manohar Nov 17 '13 at 1:47

source share

I found this useful, this is from the hadoop operations book:
At first glance, it seems that federation is different from simple multiple low-key clusters, except that the client plug-in treats them as one logical namespace. However, one of the main factors of differentiation is that each datanode in a federated cluster stores blocks for each namenode. When each namenode is formed, it generates a block pool that stores the block data associated with this namenode. Each datanode, in turn, stores data for several block pools and associates with each namenode. When a nomenoda receives a heartbeat from a datanode, it learns about the totals of the space on the datanode consumed by the other block pools, as well as data other than HDFS. the rationale for all datanodes participating in all block pools, and not just the presence of low-key clusters, is that this provides better overall utilization of the power of the datanode. Instead, if we had a separate set of dantodes entirely for the heavily used namenode A, datanodes for name B would not be used enough, while name A datanodes would try their best to keep up with the load.

0

oleh Oct 21 '17 at 6:37

source share

Charles Menguy · Accepted Answer · 2013-01-22T16:21:25+0000

Your view does not accurately refer to the Pool Block rectangle; it should read Block Pools.

I think it's worth looking at another view:

Thus, each pool of blocks is managed independently of each other, each of which is a set of blocks belonging to the same namespace. Namedoers do not talk to each other, which makes sense.

The reason for this is what I read, because it allows the namespace to generate block identifiers for new blocks without the need for coordination with other namespaces. The naming failure does not prevent the datanode from serving other namespaces in the cluster.

How Block Pools Function in the HDFS Federation

More articles: