This is because Spark provides only Long , Double and Float batteries by default. If you need something else, you need to expand AccumulatorParam .
import org.apache.spark.AccumulatorParam object StringAccumulatorParam extends AccumulatorParam[String] { def zero(initialValue: String): String = { "" } def addInPlace(s1: String, s2: String): String = { s"$s1 $s2" } } val stringAccum = sc.accumulator("")(StringAccumulatorParam) val rdd = sc.parallelize("foo" :: "bar" :: Nil, 2) rdd.foreach(s => stringAccum += s) stringAccum.value
Note
In general, you should avoid using batteries for tasks where data can increase significantly over time. Its behavior will be similar to group an collect , and in the worst case scenario, the script may fail due to lack of resources. Batteries are useful mainly for simple diagnostic tasks, such as tracking basic statistics.
source share