Hdfs node data disconnected from namenode

From time to time I get the following errors in the cloudera manager:

This DataNode is not connected to one or more of its NameNode(s). 

and

 The Cloudera Manager agent got an unexpected response from this role web server. 

(usually together, sometimes only one of them)

In most references to these errors in SO and Google, the problem is related to the configuration problem (and the node data never connects to the node name)

In my case, data nodes usually connect at startup, but after a while lose their connection - so it does not look like a bad configuration.

  • Any other options?
  • Is it possible to force connection of node data to node name?
  • Is it possible to "ping" the name of a node from node data (simulate an attempt to connect node data)
  • Could this be some kind of resource problem (for many open files / connections)?

log samples (errors change from time to time)

 2014-02-25 06:39:49,179 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: exception: java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662) 2014-02-25 06:39:49,180 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.56.144.18:50010, dest: /10.56.144.28:48089, bytes: 132096, op: HDFS_READ, cliID: DFSClient_NONMAPREDUCE_1315770947_27, offset: 0, srvID: DS-990970275-10.56.144.18-50010-1384349167420, blockid: BP-1381780028-10.56.144.16-1384349161741:blk_-8718668700255896235_5121440, duration: 480291679056 2014-02-25 06:39:49,180 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.56.144.18, storageID=DS-990970275-10.56.144.18-50010-1384349167420, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster16;nsid=7043943;c=0):Got exception while serving BP-1381780028-10.56.144.16-1384349161741:blk_-8718668700255896235_5121440 to /10.56.144.28:48089 java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662) 2014-02-25 06:39:49,181 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: host.com:50010:DataXceiver error processing READ_BLOCK operation src: /10.56.144.28:48089 dest: /10.56.144.18:50010 java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662) 
+7
hadoop hdfs cloudera
source share
3 answers

Hadoop uses specific ports for communication between the DataNode and NameNode. The firewall might be blocking these specific ports. Check the default ports in Cloudera WebSite and check the connection to NameNode with specific ports.

0
source share

If you are using Linux, make sure that you have configured these properties correctly:

  1. Disable SELINUX

enter the getenforce command in the CLI and, if it shows forced execution, means that it is enabled. Modify it in the / etc / selinux / config file.

  1. Disable firewall

  2. Make sure you have the NTP service installed.

  3. Make sure your server can use SSH for all client nodes.

  4. Ensure that all nodes have an FQDN (fully qualified domain name) and have an entry in / etc / hosts with the name and IP.

If these settings are in the right place, please attach a log of any of your data that has been disabled.

0
source share

I encountered this error

"This DataNode is not connected to one or more of its NameNode."

and I solved it by turning off safe mode and restarting the HDFS service

0
source share

All Articles