From time to time I get the following errors in the cloudera manager:
This DataNode is not connected to one or more of its NameNode(s).
and
The Cloudera Manager agent got an unexpected response from this role web server.
(usually together, sometimes only one of them)
In most references to these errors in SO and Google, the problem is related to the configuration problem (and the node data never connects to the node name)
In my case, data nodes usually connect at startup, but after a while lose their connection - so it does not look like a bad configuration.
- Any other options?
- Is it possible to force connection of node data to node name?
- Is it possible to "ping" the name of a node from node data (simulate an attempt to connect node data)
- Could this be some kind of resource problem (for many open files / connections)?
log samples (errors change from time to time)
2014-02-25 06:39:49,179 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: exception: java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662) 2014-02-25 06:39:49,180 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.56.144.18:50010, dest: /10.56.144.28:48089, bytes: 132096, op: HDFS_READ, cliID: DFSClient_NONMAPREDUCE_1315770947_27, offset: 0, srvID: DS-990970275-10.56.144.18-50010-1384349167420, blockid: BP-1381780028-10.56.144.16-1384349161741:blk_-8718668700255896235_5121440, duration: 480291679056 2014-02-25 06:39:49,180 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.56.144.18, storageID=DS-990970275-10.56.144.18-50010-1384349167420, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster16;nsid=7043943;c=0):Got exception while serving BP-1381780028-10.56.144.16-1384349161741:blk_-8718668700255896235_5121440 to /10.56.144.28:48089 java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662) 2014-02-25 06:39:49,181 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: host.com:50010:DataXceiver error processing READ_BLOCK operation src: /10.56.144.28:48089 dest: /10.56.144.18:50010 java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.56.144.18:50010 remote=/10.56.144.28:48089] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:504) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:673) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:338) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:662)
hadoop hdfs cloudera
Ophir yoktan
source share