Shark / sparks throw NPE when querying a table

Question

Shark / sparks throw NPE when querying a table

The wiki shark / spark development part is really short, so I tried to compile code to programmatically query the table. Here it is...

object Test extends App { val master = "spark://localhost.localdomain:8084" val jobName = "scratch" val sparkHome = "/home/shengc/Downloads/software/spark-0.6.1" val executorEnvVars = Map[String, String]( "SPARK_MEM" -> "1g", "SPARK_CLASSPATH" -> "", "HADOOP_HOME" -> "/home/shengc/Downloads/software/hadoop-0.20.205.0", "JAVA_HOME" -> "/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64", "HIVE_HOME" -> "/home/shengc/Downloads/software/hive-0.9.0-bin" ) val sc = new shark.SharkContext(master, jobName, sparkHome, Nil, executorEnvVars) sc.sql2console("create table src") sc.sql2console("load data local inpath '/home/shengc/Downloads/software/hive-0.9.0-bin/examples/files/kv1.txt' into table src") sc.sql2console("select count(1) from src") }

I can create an src table and load the data into src fine, but the last query dropped NPE and failed, here is the output ...

 13/01/06 17:33:20 INFO execution.SparkTask: Executing shark.execution.SparkTask 13/01/06 17:33:20 INFO shark.SharkEnv: Initializing SharkEnv 13/01/06 17:33:20 INFO execution.SparkTask: Adding jar file:///home/shengc/workspace/shark/hive/lib/hive-builtins-0.9.0.jar java.lang.NullPointerException at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:58) at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:55) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38) at shark.execution.SparkTask.execute(SparkTask.scala:55) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951) at shark.SharkContext.sql(SharkContext.scala:58) at shark.SharkContext.sql2console(SharkContext.scala:84) at Test$delayedInit$body.apply(Test.scala:20) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:60) at scala.App$$anonfun$main$1.apply(App.scala:60) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59) at scala.collection.immutable.List.foreach(List.scala:76) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30) at scala.App$class.main(App.scala:60) at Test$.main(Test.scala:4) at Test.main(Test.scala) FAILED: Execution Error, return code -101 from shark.execution.SparkTask13/01/06 17:33:20 ERROR ql.Driver: FAILED: Execution Error, return code -101 from shark.execution.SparkTask 13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=Driver.execute start=1357511600030 end=1357511600054 duration=24> 13/01/06 17:33:20 INFO ql.Driver: <PERFLOG method=releaseLocks> 13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=releaseLocks start=1357511600054 end=1357511600054 duration=0>

However, I can query the src table by typing select * from src in the shell called by bin / shark-insidefo

You can ask me how to try this sql in a shell running "bin / shark-shell". Well, I can’t get into this shell. Here is the error I encountered ...

https://groups.google.com/forum/?fromgroups=#!topic/shark-users/glZzrUfabGc

[EDIT 1]: this NPE is apparently due to the fact that SharkENV.sc was not installed, so I added

 shark.SharkEnv.sc = sc

before performing any sql2console operations. Then he complained to ClassNotFoundException scala.tools.nsc, so I manually put the scala compiler in the classpath. After that, the code complained about another ClassNotFoundException, which I can’t figure out how to fix it, because I put the shark can in the classpath.

 13/01/06 18:09:34 INFO cluster.TaskSetManager: Lost TID 1 (task 1.0:1) 13/01/06 18:09:34 INFO cluster.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: shark.execution.TableScanOperator$$anonfun$preprocessRdd$3 at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264)

[EDIT 2]: OK, I figured out another code that can do what I want, following exactly the acoustic source code for initializing the interactive response.

 System.setProperty("MASTER", "spark://localhost.localdomain:8084") System.setProperty("SPARK_MEM", "1g") System.setProperty("SPARK_CLASSPATH", "") System.setProperty("HADOOP_HOME", "/home/shengc/Downloads/software/hadoop-0.20.205.0") System.setProperty("JAVA_HOME", "/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64") System.setProperty("HIVE_HOME", "/home/shengc/Downloads/software/hive-0.9.0-bin") System.setProperty("SCALA_HOME", "/home/shengc/Downloads/software/scala-2.9.2") shark.SharkEnv.initWithSharkContext("scratch") val sc = shark.SharkEnv.sc.asInstanceOf[shark.SharkContext] sc.sql2console("select * from src")

It's ugly, but at least it works. Any comments on how to write a more robust piece of code are welcome!

For those who want to programmatically work with a shark, note that all bushes for the hive and shark must be in your CLASSPATH, and the scala compiler must also be in your class path. Another important thing - hadoop conf should be in the classpath too.

+4

nullpointerexception scala classnotfoundexception apache-spark shark-sql

Sheng Jan 6 '13 at 10:53

source share

1 answer

ra. · Answer 1 · 2014-03-26T22:04:14+0000

I believe the problem is that your SharkEnv is not initialized. I am using shark 0.9.0 (but I believe you should initialize SharkEnv to 0.6.1 too), and my SharkEnv is initialized as follows:

 // SharkContext val sc = new SharkContext(master, jobName, System.getenv("SPARK_HOME"), Nil, executorEnvVar) // Initialize SharkEnv SharkEnv.sc = sc // create and populate table sc.runSql("CREATE TABLE src(key INT, value STRING)") sc.runSql("LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src") // print result to stdout println(sc.runSql("select * from src")) println(sc.runSql("select count(*) from src"))

Also, try to request data from the src table (comment line with "select count (*) ...") without aggregation of functions, I had a similar problem when the data request was ok, but count (*) threw an exception, fixed adding mysql-connector-java.jar to yarn.application.classpath in my case.

Shark / sparks throw NPE when querying a table

More articles: