I deployed a cluster of 3 node AWS ElasticMapReduce, downloaded from Apache Spark. From my local machine, I can access the master node via SSH:
ssh -i <key> hadoop@ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com After ssh'd in the master node, I can access PySpark through pyspark . In addition, (albeit unsafe), I configured my primary node security group to accept TCP traffic from my local computer IP address specifically on port 7077 .
However, I still cannot connect my local PySpark instance to my cluster:
MASTER=spark://ec2-master-node-public-address:7077 ./bin/pyspark
The above command raises a number of exceptions and causes PySpark to not initialize the SparkContext object.
Does anyone know how to successfully create a remote connection like the one I describe above?
amazon-ec2 emr apache-spark pyspark
Soubhik
source share