Add jar to pyspark when using laptop

I am trying to integrate mongodb hadoop with spark, but I canโ€™t figure out how to make banks accessible for IPython laptop.

Here is what I am trying to do:

# set up parameters for reading from MongoDB via Hadoop input format config = {"mongo.input.uri": "mongodb://localhost:27017/db.collection"} inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat" # these values worked but others might as well keyClassName = "org.apache.hadoop.io.Text" valueClassName = "org.apache.hadoop.io.MapWritable" # Do some reading from mongo items = sc.newAPIHadoopRDD(inputFormatClassName, keyClassName, valueClassName, None, None, config) 

This code works fine when I run it in pyspark using the following command:

spark-1.4.1 / bin / pyspark -jars 'mongo-hadoop-core-1.4.0.jar, mongo-java-driver-3.0.2.jar'

where mongo-hadoop-core-1.4.0.jar and mongo-java-driver-2.10.1.jar allows you to use mongodb from java. However, when I do this:

IPYTHON_OPTS = "notebook" spark-1.4.1 / bin / pyspark - jars 'mongo-hadoop-core-1.4.0.jar, mongo-java-driver-3.0.2.jar'

Banks are no longer available, and I get the following error:

java.lang.ClassNotFoundException: com.mongb.hadoop.MongoInputFormat

Does anyone know how to make cans available for sparks in an IPython laptop? I am sure this does not apply to mongo, so maybe someone has already managed to add banks to the classpath when using a laptop?

+5
source share
1 answer

Very similar, please let me know if this helps: https://issues.apache.org/jira/browse/SPARK-5185

+4
source

All Articles