I am trying to run an example hasoop-streaming command:
hadoop-streaming -files streamingCode/wordSplitter.py \ -mapper wordSplitter.py \ -input s3://elasticmapreduce/samples/wordcount/input \ -output streamingCode/wordCountOut \ -reducer aggregate
but I keep getting this error:
Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Moved Permanently (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: 98038E504E150CEC), S3 Extended Request ID: IW1x5otBSepAnPgW/RKELCUI9dhADQvrXqU2Ase1CLIa0SWDFnBbTscXihrvHvNm2ZRxjjSJZ1Q=
I think this is because my cluster is in us-west-2 , but I cannot figure out how to format the s3 URL correctly (or maybe this is not a problem at all).
Edit: by changing it to the following URL:
s3://s3-us-west-2.amazonaws.com/elasticmapreduce/samples/wordcount/input
Now I get the following error:
Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3 Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: BC8DB415C780DF84), S3 Extended Request ID: sx8W/+gvND2ssqQce9ZQsZTiqxmSJYZs8OiXgrjwL3dm0JRPaC7ceHor+yrHsPuKTjM2LUwkRAw=
Edit: Thus, I confirmed that the error is really related to the fact that my cluster is in us-west-2 , I created a cluster in us-east-1 and it works correctly. So the question is, how to access the s3 bucket from another region? Is it possible?
source share