flatMapValues method is a combination of flatMap and mapValues .
Let's start with this RDD.
val sampleRDD = sc.parallelize(Array((1,2),(3,4),(3,6)))
mapValues displays values while retaining keys.
For example, sampleRDD.mapValues(x => x to 5) returns
Array((1,Range(2, 3, 4, 5)), (3,Range(4, 5)), (3,Range()))
note that for a key-value pair (3, 6) , (3,Range()) , since 6 to 5 creates an empty collection of values.
flatMap “splits” collections into collection elements. You can find a more accurate description of flatMap online, for example here and here .
For instance,
given val rdd2 = sampleRDD.mapValues(x => x to 5) , if we do rdd2.flatMap(x => x) , you will get
Array((1,2),(1,3),(1,4),(1,5),(3,4),(3,5)).
That is, for each element in the collection of each key, we create a pair (key, element) .
Also note that (3, Range()) does not create any additional pair of key elements since the sequence is empty.
now, combining flatMap and mapValues , you get flatMapValues .