Spark 1.3.0, python, avro files, class-drive path set to spark-defaults.conf but not seen by subordinates

Question

Spark 1.3.0, python, avro files, class-drive path set to spark-defaults.conf but not seen by subordinates

I am using spark 1.3.0 with python. I have an application that reads an avro file using the following commands:

conf = None

rddAvro = sc.newAPIHadoopFile(
    fileAvro,
    "org.apache.avro.mapreduce.AvroKeyInputFormat",
    "org.apache.avro.mapred.AvroKey",    
    "org.apache.hadoop.io.NullWritable",
    KeyConverter="org.apache.spark.examples.pythonconverters.AvroWrapperToJavaConverter",
    conf=conf)

In mine conf/spark-defaults.conf, I have the following line:

spark.driver.extraClassPath /pathto/spark-1.3.0/lib/spark-examples-1.3.0-hadoop2.4.0.jar

I installed a cluster of three machines (two masters and a subordinate):

If I run spark-submit --master localon the host computer, it works
If I run spark-submit --master localon any of the slaves, it works
If I ran sbin/start-all.sh, and then spark-submit --master spark://cluster-data-master:7077, it did not execute the following error:
```
java.lang.ClassNotFoundException:
org.apache.spark.examples.pythonconverters.AvroWrapperToJavaConverter
```

I can reproduce this error in the local model by commenting out the driver line in the file .conf. I tried spark-submitwith the appropriate one --driver-class-path, but it doesn’t work either!

Solution Update

, :

spark-submit --driver-class-path path/to/appropriate.jar script
jar spark-defaults.conf file
jar , SparkConf().set(...).set("spark.executor.extraClassPath","path/to/appropriate.ja‌r") python.

conf . --jars, , .

+4

python hadoop apache-spark avro

MathiasOrtner 08 . '15 20:40

1

Prasad Kasturi · Answer 1 · 2015-07-31T20:07:04+0000

--master harn-cluster

, :

yarn.nodemanager.resource.memory

yarn.scheduler.maximum

spark-submit --master yarn-client --num-executors 5 --driver-core 8 --driver-memory 50G --executor-memory 44G code_to_run.py

Spark 1.3.0, python, avro files, class-drive path set to spark-defaults.conf but not seen by subordinates

yarn.nodemanager.resource.memory

yarn.scheduler.maximum

More articles: