Calling clear and map together throws NPE into spark library

Question

Calling clear and map together throws NPE into spark library

I'm not sure if this is a mistake, so if you do something like this

// d:spark.RDD[String] d.distinct().map(x => d.filter(_.equals(x)))

you will get Java NPE. However, if you execute collect immediately after distinct , everything will be fine.

I am using spark 0.6.1.

+4

nullpointerexception scala apache-spark

Sheng Dec 07

source share

2 answers

what about the example window given in the Spark 1.3.0 thread programming guide

 val dataset: RDD[String, String] = ... val windowedStream = stream.window(Seconds(20))... val joinedStream = windowedStream.transform { rdd => rdd.join(dataset) }

SPARK-5063 causes the example to fail because the connection is called from a conversion method to RDD

0

Chris Apr 10 '15 at 22:06

source share

Josh Rosen · Accepted Answer · 2013-01-02 22:52

Spark does not support nested RDDs or user-defined functions that relate to other RDDs, hence NullPointerException; see this thread on the spark-users mailing list .

It looks like your current code is trying to group d elements by value; you can do this efficiently with the groupBy() RDD method:

 scala> val d = sc.parallelize(Seq("Hello", "World", "Hello")) d: spark.RDD[java.lang.String] = spark.ParallelCollection@55c0c66a scala> d.groupBy(x => x).collect() res6: Array[(java.lang.String, Seq[java.lang.String])] = Array((World,ArrayBuffer(World)), (Hello,ArrayBuffer(Hello, Hello)))

Calling clear and map together throws NPE into spark library

More articles: