Warning when using RDD for understanding

I get a warning when using RDD as part of a supplement, and I'm not sure if this is what I am doing wrong. If I do this:

val sc = new SparkContext(...) val anRDD = sc.parallelize(List( ("a", List(1, 2, 3)), ("b", List(4), ("c", List(5, 6)) ) for { (someString, listOfInts) <- anRDD someInt <- listOfInts } yield (someString, someInt) 

Then I get this output:

  warning: `withFilter' method does not yet exist on org.apache.spark.rdd.RDD[(String, List[Int])], using `filter' method instead (s, li) <- rl 

But it still successfully returns FlatMappedRDD [(String, Int)]. Am I doing something wrong? Or is it possible to ignore this warning?

Update: I also take as an answer how for-comprehension converts these operations into map / flatMap / filter calls, since I did not think that there would be any filters or Filter calls required. I suggested that this would be equivalent to something like this:

 anRDD.flatMap(tuple => tuple.map(someInt => (tuple._1, someInt))) 

But this does not include any filter or calls with filters that appear to be a warning source.

Oh, I use Spark 1.2.0, Scala 2.10.4, and all this in REPL.

+5
source share
2 answers

Firstly, I am not an expert, but I did something, and here is what I found:

I compiled the code using -print (because the JavaDecompiler for some reason did not work), which prints a program with all Scala-specific functions. There I saw:

 test.this.anRDD().filter({ (new anonymous class anonfun$1(): Function1) }).flatMap({ (new anonymous class anonfun$2(): Function1) }, ClassTag.apply(classOf[scala.Tuple2])); 

You will notice filter ... therefore, I checked on anonfun$1 :

 public final boolean apply(Tuple2<String, List<Object>> check$ifrefutable$1) { Tuple2 localTuple2 = check$ifrefutable$1; boolean bool; if (localTuple2 != null) { bool = true; } else { bool = false; } return bool; } 

So, if you put it all together, it seems that filter happening in understanding, because it filters out everything that is NOT Tuple2 .

And, it is preferable to use withFilter instead of filter (not sure why atm). You can see that by decompiling a regular list instead of RDD

 object test { val regList = List( ("a", List(1, 2, 3)), ("b", List(4)), ("c", List(5, 6)) ) val foo = for { (someString, listOfInts) <- regList someInt <- listOfInts } yield (someString, someInt) } 

What decompiles:

 test.this.regList().withFilter({ (new anonymous class anonfun$1(): Function1) }).flatMap({ (new anonymous class anonfun$2(): Function1) }, immutable.this.List.canBuildFrom()).$asInstanceOf[List](); 

So this is one and the same, except it uses withFilter where it can

+1
source

Call collect () in the RDD before submitting it for understanding.

 val collectedList = anRDD.collect for { (someString, listOfInts) <- collectedList someInt <- listOfInts } yield (someString, someInt) 
0
source

Source: https://habr.com/ru/post/1211521/


All Articles