Boto3 EMR - step of the hive

Is it possible to follow the steps of the hive with boto 3? I do this using AWS CLI, but from the docs ( http://boto3.readthedocs.org/en/latest/reference/services/emr.html#EMR.Client.add_job_flow_steps ) it seems that only banks are accepted. If possible Hive steps, where are the resources located?

thanks

+6
source share
2 answers

In a previous version of Boto, there was a helper class called HiveStep , which made it easy to create a job flow step to complete a Hive job. However, in Boto3, the approach has changed, and classes are generated at runtime from the AWS REST API. As a result, such an auxiliary class does not exist. Looking at the source code of HiveStep , https://github.com/boto/boto/blob/2d7796a625f9596cbadb7d00c0198e5ed84631ed/boto/emr/step.py , you can see that this is a subclass of Step , which is a class with jar args and mainclass properties very similar to requirements in Boto3.

It turns out that all the steps of the workflow in EMR, including Hive, still have to be created from the JAR. Therefore, you can follow Hive steps through Boto3, but there is no helper class to simplify the construction of the definition.

Considering the approach used by HiveStep in a previous version of Boto, you can build the correct definition of the job stream.

Or you can revert to using the previous version of Boto.

+3
source

I managed to get this to work with Boto3:

 # First create your hive command line arguments hive_args = "hive -v -f s3://user/hadoop/hive.hql" # Split the hive args to a list hive_args_list = hive_args.split() # Initialize your Hive Step hiveEmrStep=[ { 'Name': 'Hive_EMR_Step', 'ActionOnFailure': 'CONTINUE', 'HadoopJarStep': { 'Jar': 'command-runner.jar', 'Args': hive_args_list } }, ] # Create Boto3 session and client session = boto3.Session(region_name=AWS_REGION,profile_name=AWS_PROFILE) client = session.client('emr') # Submit and execute EMR Step client.add_job_flow_steps(JobFlowId=cluster_id,Steps=hiveEmrStep) #Where cluster_id is the ID of your cluster from AWS EMR (ex: j-2GS7xxxxxx) 
+3
source

All Articles