Fixed bug with intrinsically safe python code workdcount

I just copied the python code containing the sparking stream and used the spark-submit function to run the python wordcount code in the Spark cluster, but it shows the following errors:

py4j.protocol.Py4JJavaError: An error occurred while calling o23.loadClass. : java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 

I created the assembly jar spark-streaming-kafka-assembly_2.10-1.4.0-SNAPSHOT.jar. And I used the following script to send: bin / spark-submit / data / spark-1.3.0-bin-hadoop2.4 / wordcount.py - Source: //192.168.100.6: 7077 --jars / data / spark- 1.3.0-bin-hadoop2 0.4 / Kafka assembly / target / spark streaming-Kafka assembly _ *. bank.

Thanks in advance!

+1
source share
2 answers

In fact, I just realized that you included "jars" after the script. Jar files will not be included if bans are not specified before the script name. . So use spark-submit - jars spark-streaming-kafka-assembly_2.10-1.3.1.jar script.py instead of spark-submit script.py --jars spark-streaming-kafka-assembly_2.10-1.3.1. jar.

+2
source

I had to reference several cans in my team to get this to work. Maybe try to specify the boxes explicitly, maybe it will not be correctly selected from the bank you created.

  /opt/spark/spark-1.3.1-bin-hadoop2.6/bin/spark-submit --jars /root/spark-streaming-kafka_2.10-1.3.1.jar,/usr/hdp/2.2.4.2-2/kafka/libs/kafka_2.10-0.8.1.2.2.4.2-2.jar,/usr/hdp/2.2.4.2-2/kafka/libs/zkclient-0.3.jar,/root/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar kafka_wordcount.py kafkaAddress:2181 topicName 

Actually, he did not seem to be collecting this jar: kafka_2.10-0.8.1.2.2.4.2-2.jar

+1
source

All Articles