Retrieval errors often occur due to DNS issues. Check each datanode to make sure that the host name and ip address that it is configured with DNS resolution is resolved for that host name.
You can do this by visiting each node in your cluster and running hostname and ifconfig , and pay attention to the host name and the returned ip address. Say for example, this returns the following:
namenode.foo.com 10.1.1.100 datanode1.foo.com 10.1.1.1 datanode2.foo.com 10.1.1.2 datanode3.foo.com 10.1.1.3
Then repeat all node and nslookup all host names returned from other nodes. Make sure the returned ip address matches the one found from ifconfig . For example, if on datanode1.foo.com you should do the following:
nslookup namenode.foo.com nslookup datanode2.foo.com nslookup datanode3.foo.com
and you should return:
10.1.1.100 10.1.1.2 10.1.1.3
When you perform a task on a subset of the data, you probably did not have enough partitions to run the task in the incorrectly configured datanode (s).
source share