Fixed bug with intrinsically safe python code workdcount

Question

Fixed bug with intrinsically safe python code workdcount

I just copied the python code containing the sparking stream and used the spark-submit function to run the python wordcount code in the Spark cluster, but it shows the following errors:

py4j.protocol.Py4JJavaError: An error occurred while calling o23.loadClass. : java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

I created the assembly jar spark-streaming-kafka-assembly_2.10-1.4.0-SNAPSHOT.jar. And I used the following script to send: bin / spark-submit / data / spark-1.3.0-bin-hadoop2.4 / wordcount.py - Source: //192.168.100.6: 7077 --jars / data / spark- 1.3.0-bin-hadoop2 0.4 / Kafka assembly / target / spark streaming-Kafka assembly _ *. bank.

Thanks in advance!

+1

apache-spark apache-kafka spark-streaming spark-streaming-kafka

Jack Apr 7 '15 at 6:28

source share

2 answers

I had to reference several cans in my team to get this to work. Maybe try to specify the boxes explicitly, maybe it will not be correctly selected from the bank you created.

  /opt/spark/spark-1.3.1-bin-hadoop2.6/bin/spark-submit --jars /root/spark-streaming-kafka_2.10-1.3.1.jar,/usr/hdp/2.2.4.2-2/kafka/libs/kafka_2.10-0.8.1.2.2.4.2-2.jar,/usr/hdp/2.2.4.2-2/kafka/libs/zkclient-0.3.jar,/root/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar kafka_wordcount.py kafkaAddress:2181 topicName

Actually, he did not seem to be collecting this jar: kafka_2.10-0.8.1.2.2.4.2-2.jar

+1

Colman May 13, '15 at 3:03

source share

Colman · Accepted Answer · 2015-05-18T05:41:54+0000

In fact, I just realized that you included "jars" after the script. Jar files will not be included if bans are not specified before the script name. . So use spark-submit - jars spark-streaming-kafka-assembly_2.10-1.3.1.jar script.py instead of spark-submit script.py --jars spark-streaming-kafka-assembly_2.10-1.3.1. jar.

Fixed bug with intrinsically safe python code workdcount

More articles: