How to install my code and dependencies in an AWS Spark cluster?

Question

How to install my code and dependencies in an AWS Spark cluster?

I can create a Spark cluster on AWS as described here .

However, my own Python and pip code libraries should run on the main and working ones. This is a lot of code, and the installation process in pip also compiles some native libraries, so I can’t just get Spark to distribute this code at runtime using methods such as registering pin requirements for a file with spark_context or the py argument of spark-submit files .

Of course, I could run a bash script right after running aws emr create-cluster , but I am wondering if there is a more automatic way so that I can avoid serving a large bash script for installation.

So what is the best way to configure clusters to include my code and dependencies?

+5

python pip amazon-web-services amazon-ec2 apache-spark

Joshua fox Sep 08 '15 at 9:37

source share

No one has answered this question yet.

See similar questions:

21

Python modules for delivery to pyspark for other nodes

or similar:

5116

How to check if a file exists without exceptions?