Spark 1.3.0, python, avro files, class-drive path set to spark-defaults.conf but not seen by subordinates

I am using spark 1.3.0 with python. I have an application that reads an avro file using the following commands:

conf = None

rddAvro = sc.newAPIHadoopFile(
    fileAvro,
    "org.apache.avro.mapreduce.AvroKeyInputFormat",
    "org.apache.avro.mapred.AvroKey",    
    "org.apache.hadoop.io.NullWritable",
    KeyConverter="org.apache.spark.examples.pythonconverters.AvroWrapperToJavaConverter",
    conf=conf)

In mine conf/spark-defaults.conf, I have the following line:

spark.driver.extraClassPath /pathto/spark-1.3.0/lib/spark-examples-1.3.0-hadoop2.4.0.jar

I installed a cluster of three machines (two masters and a subordinate):

  • If I run spark-submit --master localon the host computer, it works
  • If I run spark-submit --master localon any of the slaves, it works
  • If I ran sbin/start-all.sh, and then spark-submit --master spark://cluster-data-master:7077, it did not execute the following error:

    java.lang.ClassNotFoundException:
    org.apache.spark.examples.pythonconverters.AvroWrapperToJavaConverter
    

I can reproduce this error in the local model by commenting out the driver line in the file .conf. I tried spark-submitwith the appropriate one --driver-class-path, but it doesn’t work either!

Solution Update

, :

  • spark-submit --driver-class-path path/to/appropriate.jar script
  • jar spark-defaults.conf file
  • jar , SparkConf().set(...).set("spark.executor.extraClassPath","path/to/appropriate.ja‌​r") python.

conf . --jars, , .

+4
1

--master harn-cluster

, :

yarn.nodemanager.resource.memory

yarn.scheduler.maximum

spark-submit --master yarn-client --num-executors 5 --driver-core 8 --driver-memory 50G --executor-memory 44G code_to_run.py

0

All Articles