I ran into a similar problem for another jar ("MongoDB connector for Spark", mongo-spark-connector ), but the big caveat is that I installed Spark using pyspark in conda ( conda install pyspark ). So all the help for Spark -specific answers was not entirely helpful. For those you install with conda , here is the process that I combined:
1) Find where pyspark/jars your pyspark/jars . Mines were on this path: ~/anaconda2/pkgs/pyspark-2.3.0-py27_0/lib/python2.7/site-packages/pyspark/jars .
2) Download the jar file to the path found in step 1 from this location .
3) Now you should be able to run something like this (code taken from the official MongoDB tutorial using Breeford Wiley's answer above ):
from pyspark.sql import SparkSession my_spark = SparkSession \ .builder \ .appName("myApp") \ .config("spark.mongodb.input.uri", "mongodb://127.0.0.1:27017/spark.test_pyspark_mbd_conn") \ .config("spark.mongodb.output.uri", "mongodb://127.0.0.1:27017/spark.test_pyspark_mbd_conn") \ .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.11:2.2.2') \ .getOrCreate()
Disclaimer :
1) I do not know if this answer is the right place / SO question to pose this; please report the best place and I will translate it.
2) If you think that I was wrong or I have improvements in the process described above, please comment and I will review.
ximiki
source share