Running EMR example, getting 301 errors

I am trying to run an example hasoop-streaming command:

hadoop-streaming -files streamingCode/wordSplitter.py \ -mapper wordSplitter.py \ -input s3://elasticmapreduce/samples/wordcount/input \ -output streamingCode/wordCountOut \ -reducer aggregate 

but I keep getting this error:

 Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Moved Permanently (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: 98038E504E150CEC), S3 Extended Request ID: IW1x5otBSepAnPgW/RKELCUI9dhADQvrXqU2Ase1CLIa0SWDFnBbTscXihrvHvNm2ZRxjjSJZ1Q= 

I think this is because my cluster is in us-west-2 , but I cannot figure out how to format the s3 URL correctly (or maybe this is not a problem at all).

Edit: by changing it to the following URL:

 s3://s3-us-west-2.amazonaws.com/elasticmapreduce/samples/wordcount/input 

Now I get the following error:

 Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3 Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: BC8DB415C780DF84), S3 Extended Request ID: sx8W/+gvND2ssqQce9ZQsZTiqxmSJYZs8OiXgrjwL3dm0JRPaC7ceHor+yrHsPuKTjM2LUwkRAw= 

Edit: Thus, I confirmed that the error is really related to the fact that my cluster is in us-west-2 , I created a cluster in us-east-1 and it works correctly. So the question is, how to access the s3 bucket from another region? Is it possible?

+5
source share
1 answer

Amazon changed the default behavior starting with emr-4.7.0, which caused this error when we updated the EMR versions.

The solution is simple, add this configuration to the main site: fs.s3n.endpoint = s3.amazonaws.com

+1
source

All Articles