I created a multi-node Hadoop cluster. NameNode and Secondary namenode work on the same computer, and there is only one Datanode in the cluster. All nodes are configured on Amazon EC2 machines.
The following are the configuration files on the main node:
masters 54.68.218.192 (public IP of the master node) slaves 54.68.169.62 (public IP of the slave node)
core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
HDFS-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property> </configuration>
Now the configuration files on the datod:
core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://54.68.218.192:10001</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>54.68.218.192:10002</value> </property> </configuration>
HDFS-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property> </configuration>
running jps on Named gives the following:
5696 NameNode 6504 Jps 5905 SecondaryNameNode 6040 ResourceManager
and JPS on the datod:
2883 DataNode 3496 Jps 3381 NodeManager
which seems right to me.
Now when I try to run the put command:
hadoop fs -put count_inputfile /test/input/
This gives me the following error:
put: File /count_inputfile._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
Datode logs say the following:
hadoop-datanode log INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 54.68.218.192/54.68.218.192:10001. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
yarn management magazine:
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
The web interface of the node manager (50070) shows that there are 0 active nodes and 0 dead nodes, and the dfs value is 100%.
I also disabled IPV6.
I found out on several websites that I should also edit the /etc/hosts . I also edited them and they look like this:
127.0.0.1 localhost 172.31.25.151 ip-172-31-25-151.us-west-2.compute.internal 172.31.25.152 ip-172-31-25-152.us-west-2.compute.internal
Why am I still getting the error?