Non-covariant Scala RDD workaround

I am trying to write a function to work with RDD [Seq [String]] objects, for example:

def foo(rdd: RDD[Seq[String]]) = { println("hi") } 

This function cannot be called for objects of type RDD [Array [String]]:

 val testRdd : RDD[Array[String]] = sc.textFile("somefile").map(_.split("\\|", -1)) foo(testRdd) -> error: type mismatch; found : org.apache.spark.rdd.RDD[Array[String]] required: org.apache.spark.rdd.RDD[Seq[String]] 

I assume that since RDD is not covariant.

I tried a bunch of foo definitions to get around this. Only one of them amounted to:

 def foo2[T[String] <: Seq[String]](rdd: RDD[T[String]]) = { println("hi") } 

But it is still broken:

 foo2(testRdd) -> <console>:101: error: inferred type arguments [Array] do not conform to method foo2 type parameter bounds [T[String] <: Seq[String]] foo2(testRdd) ^ <console>:101: error: type mismatch; found : org.apache.spark.rdd.RDD[Array[String]] required: org.apache.spark.rdd.RDD[T[String]] 

Any idea how I can get around this? All this happens in the Spark shell.

+7
types scala covariance apache-spark
source share
1 answer

You can use binding for this.

Array not Seq , but it can be thought of as Seq .

 def foo[T <% Seq[String]](rdd: RDD[T]) = ??? 

<% says that T can be thought of as Seq[String] , so whenever you use the Seq[String] method on T , then T will be converted to Seq[String] .

For Array[A] , which should be considered as Seq[A] , there must be an implicit function in scope that can convert Array to Seq s. As IonuΕ£ G. Stan said, it exists in scala.Predef .

+9
source share

All Articles