Sparks java.lang.VerifyError

I get the following error when I try to call I use the python client to spark.

lines = sc.textFile(hdfs://...) lines.take(10) 

I suspect that the spark and howop versions may not be compatible. Here is the result of hadoop version: Hadoop 2.5.2 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r cc72e9b000545b86b75a61f4835eb86d57bfafc0 Compiled by jenkins on 2014-11-14T23: 45Z Compiled with protoc 2.5.0 From source with checksum df7537a4faa4658983d397abf4514320 This command was run using / etc / hadoop -2.5.2 / share / hadoop / common / hadoop-common-2.5.2.jar

I also have spark 1.3.1.

 File "/etc/spark/python/pyspark/rdd.py", line 1194, in take totalParts = self._jrdd.partitions().size() File "/etc/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/etc/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o21.partitions. : java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AppendRequestProto overrides final method getUnknownFields. ()Lcom/google/protobuf/UnknownFieldSet; at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2436) at java.lang.Class.privateGetPublicMethods(Class.java:2556) at java.lang.Class.privateGetPublicMethods(Class.java:2566) at java.lang.Class.getMethods(Class.java:1412) at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:409) at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:306) at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:610) at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:690) at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:92) at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:537) at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:366) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:262) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:153) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:602) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:547) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:139) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:64) at org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) 

I was looking for a problem, some people refer to the protobuffer version, but I do not know very well how to configure it correctly. Any idea?

+3
source share
2 answers

check pom.xml where you compiled

find the protobuf version. This can solve the problem.

Or the problem may be something else, as mentioned in this thread of Jira.

https://issues.apache.org/jira/browse/SPARK-7238

0
source

You need to check the version of jar py4j that is needed for this version of hadoop. Download this and put it in the lib folder from the spark installed directory. And check bashrc for a path reference. He will fix this error

0
source

All Articles