NullPointerException Sparks with saveAsTextFile

Question

NullPointerException Sparks with saveAsTextFile

I get NPE trying to team up and save RDD.

The code works locally and works in a cluster in the scala shell, but it throws an error when sending it as a job to the cluster.

I tried to print using the take () function to see if rdd contains some null data, but it causes the same error - pain, because it works fine in the shell.

I go to HDFS and have the full URL in the variable - the model saves this method during the preparation phase of MLLib.

Any ideas are greatly appreciated!

Scala Code (Whole Prediction Func):

//Load the Random Forest val rfModel = RandomForestModel.load(sc, modelPath) //Make the predictions - Here the label is the unique ID of the point val rfPreds = labDistVect.map(p => (p.label, rfModel.predict(p.features))) //Collect and save println("Done Modelling, now saving preds") val outP = rfPreds.coalesce(1,true).saveAsTextFile(outPreds) println("Done Modelling, now saving coords") val outC = coords.coalesce(1,true).saveAsTextFile(outCoords)

Stack trace:

  Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 40, XX.XX.XX.XX): java.lang.NullPointerException at GeoDistPredict1$$anonfun$38.apply(GeoDist1.scala:340) at GeoDistPredict1$$anonfun$38.apply(GeoDist1.scala:340) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

+3

java nullpointerexception scala hadoop apache-spark

Dusted Oct 03 '15 at 13:40

source share

1 answer

Ajay Gupta · Answer 1 · 2015-10-03 14:36

Spark operations are divided into lazy transformations and actions .

A lazy conversion to RDD is performed when an action is invoked on RDD.
Therefore, when you perform a conversion, it is simply saved as an operation in progress.

The saveAsTextFile method is an action, while the display operation is a transformation.

If there is any problem at the transformation stage, it will be displayed as a problem at the action step at which the transformation was called.

Thus, you may have a problem during a card operation in which there is a null value in some field, which is most likely causing a problem with NPE.

NullPointerException Sparks with saveAsTextFile

More articles: