Namenode format does not free up disk space

Question

Namenode format does not free up disk space

After closing the ./stop-all.sh cluster, and then calling hadoop namenode -format , I see that the datatodes have the same disk space, i.e. the space has not been freed.

Why is this?

+3

hadoop hdfs

Nerrve Nov 22 '13 at 9:29

source share

3 answers

Formatting Namenode will not format Datanode.

It will simply format the contents of your namenode. those. your namenode will no longer know where your data is. Also namenode -format will assign a new namespace identifier to namenode

You will need to change your namespaceID in your datanode for your datanode to work. This will be in the format dfs / data / current / VERSION

JIRA now opens for the same sentence as the Datanode format, as well as when formatting Namenode. HDFS-107

+2

vishnu viswanath Nov 22 '13 at 10:10

source share

When formatting a namenode, the space is not cleared. This must be done manually.

To do this,

First stop the cluster by calling ./stop-all.sh or ./stop-mapred.sh and ./stop-dfs.sh in the correct order.

Then delete the data directory for datanode, i.e. the directory specified by dfs.data.dir in hdfs-site.xml or hadoop.tmp.dir / dfs / data p>

Being able to do -rmr (mentioned in one of the other answers to this question) before doing the format is actually the best way if you are not like me, who unknowingly formatted the namenode, and THEN realized that the datanode space is not cleared; )

+2

Nerrve Nov 26 '13 at 5:05

source share

user2486495 · Accepted Answer · 2013-11-22T13:03:41+0000

You can manually delete data in a DataNode before formatting a NameNode

RMR

 Usage: hadoop fs -rmr URI [URI …]

Recursive version of uninstall. Example:

 hadoop fs -rmr /user/hadoop/dir hadoop fs -rmr hdfs://nn.example.com/user/hadoop/dir

Exit code:

Returns 0 on success and -1 on error.

As an alternative

Data nodes must be reformatted whenever the name is node. Here I see two approaches:

To reformat the cluster, we call "start-dfs -format" or create a special script "format-dfs". This would bring together all the components of the cluster. The question is, should I start the cluster after formatting?
Format name only node. When data nodes connect to a node name, they format their storage directories if it sees that the namespace is empty and its cTime = 0. The disadvantage of this approach is that we can lose node data blocks from another cluster if it the error is associated with the empty name node.

https://issues.apache.org/jira/browse/HDFS-107

Namenode format does not free up disk space

More articles: