I am using spark 1.3.0 with python. I have an application that reads an avro file using the following commands:
conf = None
rddAvro = sc.newAPIHadoopFile(
fileAvro,
"org.apache.avro.mapreduce.AvroKeyInputFormat",
"org.apache.avro.mapred.AvroKey",
"org.apache.hadoop.io.NullWritable",
KeyConverter="org.apache.spark.examples.pythonconverters.AvroWrapperToJavaConverter",
conf=conf)
In mine conf/spark-defaults.conf, I have the following line:
spark.driver.extraClassPath /pathto/spark-1.3.0/lib/spark-examples-1.3.0-hadoop2.4.0.jar
I installed a cluster of three machines (two masters and a subordinate):
- If I run
spark-submit --master localon the host computer, it works - If I run
spark-submit --master localon any of the slaves, it works If I ran sbin/start-all.sh, and then spark-submit --master spark://cluster-data-master:7077, it did not execute the following error:
java.lang.ClassNotFoundException:
org.apache.spark.examples.pythonconverters.AvroWrapperToJavaConverter
I can reproduce this error in the local model by commenting out the driver line in the file .conf. I tried spark-submitwith the appropriate one --driver-class-path, but it doesn’t work either!
Solution Update
, :
spark-submit --driver-class-path path/to/appropriate.jar script- jar
spark-defaults.conf file - jar ,
SparkConf().set(...).set("spark.executor.extraClassPath","path/to/appropriate.jar") python.
conf . --jars, , .