S3N and S3A distcp do not work in Hadoop 2.6.0

Summary

The hasoop2.6.0 Stock utility gives me no filesystem for scheme: s3n . Adding hadoop-aws.jar to the classpath now gives me a ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem .

More details

I basically had hadoop-2.6.0 installed. I installed only directories and set the following environment variables:

 export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre export HADOOP_COMMON_HOME=/opt/hadoop export HADOOP_HOME=$HADOOP_COMMON_HOME export HADOOP_HDFS_HOME=$HADOOP_COMMON_HOME export HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME export HADOOP_OPTS=-XX:-PrintWarnings export PATH=$PATH:$HADOOP_COMMON_HOME/bin 

hadoop classpath :

 /opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/yarn/lib/*:/opt/hadoop/share/hadoop/yarn/*:/opt/hadoop/share/hadoop/mapreduce/lib/*:/opt/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/hadoop/share/hadoop/tools/lib/* 

When I try to run hadoop distcp -update hdfs:///files/to/backup s3n://${S3KEY}:${S3SECRET}@bucket/files/to/backup , I get Error: java.io.Exception, no filesystem for scheme: s3n . If I use s3a, I get the same error as the complaint about s3a.

The internet told me that hadoop-aws.jar is not part of the class by default. I added the following line to /opt/hadoop/etc/hadoop/hadoop-env.sh :

 HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_COMMON_HOME/share/hadoop/tools/lib/* 

and now hadoop classpath has the following to it:

 :/opt/hadoop/share/hadoop/tools/lib/* 

which should cover /opt/hadoop/share/hadoop/tools/lib/hadoop-aws-2.6.0.jar . Now I get:

 Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found 

The jar file contains a class that cannot be found:

 unzip -l /opt/hadoop/share/hadoop/tools/lib/hadoop-aws-2.6.0.jar |grep S3AFileSystem 28349 2014-11-13 21:20 org/apache/hadoop/fs/s3a/S3AFileSystem.class 

Is there an order to add these cans, or am I missing something else critical?

+5
source share
3 answers

You can solve the s3n problem by adding the following lines to core-site.xml

 <property> <name>fs.s3n.impl</name> <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value> <description>The FileSystem for s3n: (Native S3) uris.</description> </property> 

It should work after adding this property.

Edit: If this does not solve your problem, you will have to add banks to the classpath. Can you check if mapred-site.xml has mapreduce.application.classpath: / usr / hdp // hasoop-mapreduce / *. It will include other related banks on the way to classes :)

+4
source

While working with Abhishek, commenting on his answer, the only change I needed to make was mapred-site.xml :

 <property> <!-- Add to the classpath used when running an M/R job --> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value> </property> 

No changes needed for any other xml or sh files.

+6
source

In the current Hadoop (3.1.1), this approach no longer works. This can be fixed by uncommenting the line HADOOP_OPTIONAL_TOOLS in the file etc / hadoop / hadoop-env.sh. Among other tools, this includes the hadoop-aws library.

0
source

All Articles