I get NPE trying to team up and save RDD.
The code works locally and works in a cluster in the scala shell, but it throws an error when sending it as a job to the cluster.
I tried to print using the take () function to see if rdd contains some null data, but it causes the same error - pain, because it works fine in the shell.
I go to HDFS and have the full URL in the variable - the model saves this method during the preparation phase of MLLib.
Any ideas are greatly appreciated!
Scala Code (Whole Prediction Func):
//Load the Random Forest val rfModel = RandomForestModel.load(sc, modelPath) //Make the predictions - Here the label is the unique ID of the point val rfPreds = labDistVect.map(p => (p.label, rfModel.predict(p.features))) //Collect and save println("Done Modelling, now saving preds") val outP = rfPreds.coalesce(1,true).saveAsTextFile(outPreds) println("Done Modelling, now saving coords") val outC = coords.coalesce(1,true).saveAsTextFile(outCoords)
Stack trace:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 40, XX.XX.XX.XX): java.lang.NullPointerException at GeoDistPredict1$$anonfun$38.apply(GeoDist1.scala:340) at GeoDistPredict1$$anonfun$38.apply(GeoDist1.scala:340) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
java nullpointerexception scala hadoop apache-spark
Dusted Oct 03 '15 at 13:40 2015-10-03 13:40
source share