Scalaz class classes for Apache Spark RDD

Question

Scalaz class classes for Apache Spark RDD

The goal is to implement various types of classes (e.g. Semigroup, Monad, Functor, etc.) provided by Scalaz for Spark RDD (Distributed Collection). Unfortunately, I cannot make any of the type classes that would occupy higher types (e.g. Monad, Functor, etc.) to work well with RDD.

RDDs are defined (simplified) as:

abstract class RDD[T: ClassTag](){ def map[U: ClassTag](f: T => U): RDD[U] = {...} }

Full code for RDD can be found here .

Here is one example that works great:

 import scalaz._, Scalaz._ import org.apache.spark.rdd.RDD implicit def semigroupRDD[A] = new Semigroup[RDD[A]] { def append(x:RDD[A], y: => RDD[A]) = x.union(y) }

Here is one example that does not work:

 implicit def functorRDD = new Functor[RDD] { override def map[A, B](fa: RDD[A])(f: A => B): RDD[B] = { fa.map(f) } }

This fails:

Error: No ClassTag for B fa.map (e)

The error is pretty clear. A map implemented in RDD expects a ClassTag (see above). Functor ScalaZ / monads, etc., Do not have ClassTag. Is it possible to make this work without changing Scalaz and / or Spark?

+6

scala functional-programming scalaz apache-spark rdd

marios Apr 17 '16 at 4:15

source share

1 answer

adelbertc · Accepted Answer · 2016-04-17T04:56:56+0000

Short answer: no

For the types of types, such as Functor , the restriction is that for any A and B , without limitation, for a given A => B you have a function, raised RDD[A] => RDD[B] . In Spark, you cannot select arbitrary A and B , since you need a ClassTag for B , as you saw.

For other types of classes, such as Semigroup , where the type does not change during the operation and therefore does not need a ClassTag , it works.

Scalaz class classes for Apache Spark RDD

More articles: