I get the following error:
Py4JError (u'An error when calling o73.createDirectStreamWithoutMessageHandler. Trace: \ npy4j.Py4JException: createDirectStreamWithoutMessageHandler ([class org.apache.spark.streaming.api.java.JavaStreamingContext.java class java.ash HashSet, class java.util.HashMap]) does not exist \ n \ tat py4j.reflection.ReflectionEngine.getMethod (ReflectionEngine.javahaps35) \ n \ tat py4j.reflection.ReflectionEngine.getMethod (ReflectionEngine.java: 344) \ n \ tat py4j.Gateway.invoke (Gateway.java:252) \ n \ tat py4j.commands.AbstractCommand.invokeMethod (AbstractCommand.java:133) \ n \ tat py4j.commands.CallCommand.execute (CallCommand.java: 79) \ n \ tat py4j.GatewayConnection.run (GatewayConnection.java:209) \ n \ tat java.lang.Thread.run (Thread.java:745) \ n \ n ',)
I use spark-streaming-kafka-assembly_2.10-1.6.0.jar (which is present in the / usr / lib / hadoop / lib / folder on all my nodes + wizard)
(EDIT) Actual error: java.lang.NoSuchMethodError: org.apache.hadoop.yarn.util.Apps.crossPlatformify (Ljava / lang / String;) Ljava / lang / String;
This was due to the wrong version of hadoop. Therefore, the spark must be compiled with the correct hadoop version:
mvn -Phadoop-2.6 -Dhadoop.version=2.7.2 -DskipTests clean package
This will create a jar in the external / kafka-assembly / target folder.
source
share