There is no significant performance penalty - JavaRDD is a simple wrapper around RDD to make calls from Java code more convenient. It contains the original RDD ad declaration and calls this member method on any method call, for example (from JavaRDD.scala ):
def cache(): JavaRDD[T] = wrapRDD(rdd.cache())
wrapRDD boils down to something like new JavaRDD[T](rdd) , so the only performance new JavaRDD[T](rdd) is to create a thin Java object for each method call, but this is completely insignificant since it does not execute for every element in RDD, but once for everything an object.
source share