How to connect PySpark (local computer) to my EMR cluster?

I deployed a cluster of 3 node AWS ElasticMapReduce, downloaded from Apache Spark. From my local machine, I can access the master node via SSH:

ssh -i <key> hadoop@ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com After ssh'd in the master node, I can access PySpark through pyspark . In addition, (albeit unsafe), I configured my primary node security group to accept TCP traffic from my local computer IP address specifically on port 7077 .

However, I still cannot connect my local PySpark instance to my cluster:

MASTER=spark://ec2-master-node-public-address:7077 ./bin/pyspark

The above command raises a number of exceptions and causes PySpark to not initialize the SparkContext object.

Does anyone know how to successfully create a remote connection like the one I describe above?

+7
amazon-ec2 emr apache-spark pyspark
source share
1 answer

If your local computer is not the primary node for your cluster, you cannot do this. You cannot do this with AWS EMR.

+2
source share

All Articles