How hdfs selects datanode for storage

As indicated in the header, when the client requests to write the file in hdfs, how does HDFS or the name node choose which datanode to store the file? Does hdfs save all the blocks of this file in the same node or some node in the same rack if it is too large? Does hdfs provide any APIs for storing a file in a specific datanode on its own?

+8
hadoop hdfs
source share
5 answers

The code for selecting the datanode is in the ReplicationTargetChooser.chooseTarget() function.

The comment says that:

The strategy for placing replicas is that if the writer is on a datanode, the first replica is placed on the local machine, otherwise a random datanode. The second replica is placed in a datanode that another rack is on. The third copy is placed in the datanode, which is included in the same rack as the first replica.

It does not provide an API for applications to store the file in the desired datanode.

+8
source share

how do hdfs or node name choose which datanode to store the file?

HDFS has a BlockPlacementPolicyDefault , more details can be found in the API documentation. It should be possible to extend BlockPlacementPolicy for custom behavior.

Does hdfs provide any APIs for storing a file in a specific datanode on its own?

Placement behavior should not be specific to a particular datanode. This makes HDFS fault tolerant and also scalable.

+10
source share

If someone prefers diagrams, here is the image ( source ):
enter image description here

+5
source share

Now, with the Hadoop-385 patch, we can choose a block allocation policy to place all the blocks of the file in the same node (and similarly for replicated nodes). Read this blog about this topic - see the comments section.

+2
source share

this image shows how replication process is done [] [1]

You can see that when namenode instructs the datanode to store data. The first replica is stored on the local machine, and the other two replicas are made on a different rack, etc.

If any replica fails, data is saved from another replica. The chances of failure of each replica are like falling a fan on the head during sleep: p ie, there is less chance for this.

-one
source share

All Articles