Calling clear and map together throws NPE into spark library

I'm not sure if this is a mistake, so if you do something like this

// d:spark.RDD[String] d.distinct().map(x => d.filter(_.equals(x))) 

you will get Java NPE. However, if you execute collect immediately after distinct , everything will be fine.

I am using spark 0.6.1.

+4
nullpointerexception scala apache-spark
Dec 07
source share
2 answers

Spark does not support nested RDDs or user-defined functions that relate to other RDDs, hence NullPointerException; see this thread on the spark-users mailing list .

It looks like your current code is trying to group d elements by value; you can do this efficiently with the groupBy() RDD method:

 scala> val d = sc.parallelize(Seq("Hello", "World", "Hello")) d: spark.RDD[java.lang.String] = spark.ParallelCollection@55c0c66a scala> d.groupBy(x => x).collect() res6: Array[(java.lang.String, Seq[java.lang.String])] = Array((World,ArrayBuffer(World)), (Hello,ArrayBuffer(Hello, Hello))) 
+7
Jan 02 '13 at 22:52
source share

what about the example window given in the Spark 1.3.0 thread programming guide

 val dataset: RDD[String, String] = ... val windowedStream = stream.window(Seconds(20))... val joinedStream = windowedStream.transform { rdd => rdd.join(dataset) } 

SPARK-5063 causes the example to fail because the connection is called from a conversion method to RDD

0
Apr 10 '15 at 22:06
source share



All Articles