In the context of Amazon Elastic MapReduce (Amazon EMR) , you are looking for Download Actions :
Bootstrap actions let you pass a link to a script stored in Amazon S3. This script may contain configuration parameters and arguments related to Hadoop or Elastic MapReduce . Download actions run before Hadoop starts and before node data processing begins. [emphasis mine]
The section Performing custom Bootstrap actions from the CLI provides a common use case:
& ./elastic-mapreduce --create --stream --alive \ --input s3n://elasticmapreduce/samples/wordcount/input \ --mapper s3://elasticmapreduce/samples/wordcount/wordSplitter.py \ --output s3n://myawsbucket --bootstrap-action s3://elasticmapreduce/bootstrap-actions/download.sh
In particular, there are separate download steps for configuring Hadoop and Java:
Hadoop (cluster)
You can specify Hadoop settings using the bootstrap command Configure Hadoop , which allows you to set Hadoop settings for the entire cluster, for example:
$ ./elastic-mapreduce --create \ --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \ --args "--site-config-file,s3://myawsbucket/config.xml,-s,mapred.task.timeout=0"
Java (JVM)
You can specify custom JVM settings using the boot operation Configuring daemons :
This predefined boot action allows you to specify heap size or other Java Virtual Machine (JVM) settings for Hadoop daemons. You can use this bootstrap action to configure Hadoop for large jobs that require more memory than Hadoop allocates by default. You can also use this bootstrap action to modify advanced JVM parameters, such as garbage collection behavior.
The above example sets the heap size to 2048 and sets the Java namenode parameter:
$ ./elastic-mapreduce โcreate โalive \ --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-daemons \ --args --namenode-heap-size=2048,--namenode-opts=-XX:GCTimeRatio=19
Steffen opel
source share