0 datanode (s) works, and node (s) are excluded in this operation

Question

0 datanode (s) works, and node (s) are excluded in this operation

I created a multi-node Hadoop cluster. NameNode and Secondary namenode work on the same computer, and there is only one Datanode in the cluster. All nodes are configured on Amazon EC2 machines.

The following are the configuration files on the main node:

masters 54.68.218.192 (public IP of the master node) slaves 54.68.169.62 (public IP of the slave node)

core-site.xml

 <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>

mapred-site.xml

 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>

HDFS-site.xml

 <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property> </configuration>

Now the configuration files on the datod:

core-site.xml

 <configuration> <property> <name>fs.default.name</name> <value>hdfs://54.68.218.192:10001</value> </property> </configuration>

mapred-site.xml

 <configuration> <property> <name>mapred.job.tracker</name> <value>54.68.218.192:10002</value> </property> </configuration>

HDFS-site.xml

 <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property> </configuration>

running jps on Named gives the following:

 5696 NameNode 6504 Jps 5905 SecondaryNameNode 6040 ResourceManager

and JPS on the datod:

 2883 DataNode 3496 Jps 3381 NodeManager

which seems right to me.

Now when I try to run the put command:

 hadoop fs -put count_inputfile /test/input/

This gives me the following error:

 put: File /count_inputfile._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.

Datode logs say the following:

 hadoop-datanode log INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 54.68.218.192/54.68.218.192:10001. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

yarn management magazine:

 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

The web interface of the node manager (50070) shows that there are 0 active nodes and 0 dead nodes, and the dfs value is 100%.

I also disabled IPV6.

I found out on several websites that I should also edit the /etc/hosts . I also edited them and they look like this:

 127.0.0.1 localhost 172.31.25.151 ip-172-31-25-151.us-west-2.compute.internal 172.31.25.152 ip-172-31-25-152.us-west-2.compute.internal

Why am I still getting the error?

+17

ubuntu amazon-ec2 hadoop hdfs hadoop2

Learningner Oct 24 '14 at 9:47

source share

11 answers

prayagupd · Answer 1 · 2015-05-01T03:58:03+0000

Two things worked for me

STEP 1: stop hadoop and clean temp files from hduser

 sudo rm -R /tmp/*

you may also need to delete and recreate /app/hadoop/tmp (basically, when I change the hadoop version from 2.2.0 to 2.7.0 )

 sudo rm -r /app/hadoop/tmp sudo mkdir -p /app/hadoop/tmp sudo chown hduser:hadoop /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp

STEP 2: formatting the namenode

 hdfs namenode -format

Now I see a DataNode

 hduser@prayagupd:~$ jps 19135 NameNode 20497 Jps 19477 DataNode 20447 NodeManager 19902 SecondaryNameNode 20106 ResourceManager

Tamilkumaran S · Answer 2 · 2016-01-24T05:51:10+0000

I had the same problem after node abnormal shutdown. It is also checked in a user interface in which there is no data.

Now it works after deleting files from the datanode folder and restarting the services.

stop-all.sh
rm -rf / usr / local / hadoop_store / hdfs / datanode / *
start-all.sh

kishorer747 · Answer 3 · 2014-10-30T17:40:41+0000

@Learner,
I had this problem with datanodes which is not shown in the Namenode user interface. Solved it with these steps in Hadoop 2.4.1.

do this for all nodes (master and slaves)

1. delete all temporary files (by default in / tmp) - sudo rm -R /tmp/* .
2. Now try connecting to all nodes via ssh using ssh username@host and add the keys to your wizard using ssh-copy-id -i ~/.ssh/id_rsa.pub username@host to provide unlimited access to slaves to the host device (it may not be a problem for connection failure).
3. Format namenode with hadoop namenode -format and try restarting the daemons.

mustafacanturk · Answer 4 · 2017-01-08T19:50:24+0000

Firewalld worked in my situation. This was the default. And this does not allow communication between nodes. My hadoop cluster was a test cluster. Because of this, I stopped the service. If your servers are in production, you should allow the use of suoop ports on firewalld instead

 service firewalld stop chkconfig firewalld off

Halil İbrahim Oymacı · Answer 5 · 2017-06-03T16:13:01+0000

I had the same error. I did not have permission to the hdfs file system. Therefore, I give permission to my user:

 chmod 777 /usr/local/hadoop_store/hdfs/namenode chmod 777 /usr/local/hadoop_store/hdfs/datanode

smooth_smoothie · Answer 6 · 2018-05-10T14:57:56+0000

In my situation, I lacked the necessary properties inside hdfs-site.xml (Hadoop 3.0.0) installed using HomeBrew on MacOS. ( file:/// not a typo.)

 <property> <name>dfs.namenode.name.dir</name> <value>file:///usr/local/Cellar/hadoop/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///usr/local/Cellar/hadoop/hdfs/datanode</value> </property>

rajat · Answer 7 · 2016-01-25T04:22:15+0000

This is probably due to the fact that the datanodes and namenodes data cluster identifier or node manager do not match. The cluster identifier can be seen in the VERSION file found in both namenode and datanodes.

This happens when you format your namenode and then restart the cluster, but datanodes are still trying to connect using the previous clusterID. for a successful connection, you need the correct IP address, as well as the corresponding cluster identifier on the nodes.

So, try reformatting the namenode and datanodes or just configure datanodes and namenode on the newly created folders.

This should solve your problem.

Deleting files from the current datanodes folder will also delete the old VERSION file and request a new VERSION file when reconnecting to namenode.

Example: the datanode directory in the configuration is / hadoop 2 / datanode

 $ rm -rvf /hadoop2/datanode/*

And then restart the services. If you reformat your namenode, do this before this step. Each time you reformat your namenode, it gets a new identifier, and that identifier is randomly generated and will not match the old identifier in your datanodes.

So, follow this sequence every time

if you format the namenode then Delete the contents of the datanode data directory OR configure the datanode to the newly created directory Then run your namenode and datanodes

Prabhat swami · Answer 8 · 2017-10-23T20:03:50+0000

The value of the {fs.default.name} property in the core-site.xml file, both on the host and the slave, should point to the master machine. So, it will be something like this:

 <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property>

where master is the host name in the / etc / hosts file, pointing to the master node.

Magnum codus · Answer 9 · 2018-10-30T17:21:02+0000

You tried to clear the / tmp folder.

Before cleaning, the datanode did not fit

 86528 SecondaryNameNode 87719 Jps 86198 NameNode 78968 RunJar 79515 RunJar 63964 RunNiFi 63981 NiFi

After cleaning

 sudo rm -rf /tmp/*

It worked for me

 89200 Jps 88859 DataNode

Panagiotis piperoulos · Answer 10 · 2018-12-12T06:23:41+0000

@Mustafacanturk's solution, disabling the firewall worked for me. I thought that datododes start because they appear when jps starts, but when I try to upload files, I get the message "0 nodes working." In fact, none of the web interfaces ( http: // nn1: 50070 ) worked because of the firewall. I turned off the firewall when installing hadoop, but for some reason this happened. However, sometimes cleaning or re-creating temporary folders (hadoop.tmp.dir) or even the dfs.data.dir and dfs.namenode.name.dir folders and reformatting the name server was the solution.

Javaid mirror · Answer 11 · 2018-02-09T11:21:43+0000

1) First stop all services using the stop-all.sh command

2) Delete all files inside datanode rm -rf / usr / local / hadoop_store / hdfs / datanode / *

3) Then start all services using the start-all.sh command

You can check if all your services are running with jps command

Hope it works !!!

0 datanode (s) works, and node (s) are excluded in this operation

The following are the configuration files on the main node:

Now the configuration files on the datod:

More articles: