Elastic card reduces external banks

Thus, it is easy enough to handle external jars when using hadoop straight up. You have the -libjars option that will do this for you. The question is how do you do this with EMR. There should be an easy way to do this. I thought the -cachefile CLI option would do this, but I just couldn't get it to work. Any ideas anybody?

Thanks for the help.

+8
jar hadoop amazon-emr
source share
3 answers

The best thing I had with external jar dependencies was to copy them (via the bootstrap action) to /home/hadoop/lib throughout the cluster. This path is on the class path of each node. This method is the only one that works regardless of where the code is located that accesses external banks (tool, task or task).

+6
source share

One option is to take the first step in your task to configure the JAR, wherever they are. Or, if they are dependencies, you can pack them together with the application JAR (probably on S3).

+3
source share

FYI for newer versions of EMR / home / hadoop / lib is no longer used. / Usr / lib / hadoop-mapreduce.

0
source share

All Articles