You probably need to add pyspark files to the path. I usually use the following function.
def configure_spark(spark_home=None, pyspark_python=None): spark_home = spark_home or "/path/to/default/spark/home" os.environ['SPARK_HOME'] = spark_home
Then you can call the function before importing pyspark:
configure_spark('/path/to/spark/home') from pyspark import SparkContext
The spark home on the EMR node should be something like /home/hadoop/spark . See https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923 for details.
source share