Ungrouping a pair (key, list (values)) in Spark / Scala

I have data formatted as follows:

DataRDD = [(String, List[String])]

The key is indicated on the first line, and the list contains values. Note that the number of values ​​is different for each key (but never equal to zero). I am looking for an RDD map in such a way that for each element in the list there will be a pair of keys, values. To clarify this, imagine that the entire RDD is presented in the following list:

DataRDD = [(1, [a, b, c]), 
           (2, [d, e]),
           (3, [a, e, f])]

Then I would like to get the result:

DataKV  = [(1, a),
           (1, b),
           (1, c),
           (2, d),
           (2, e),
           (3, a),
           (3, e),
           (3, f)]

Therefore, I would like to return all key combinations that have the same value. This can be returned to the list for each key, even if there are no identical values:

DataID  = [(1, [3]),
           (2, [3]),
           (3, [1, 2])]

Spark Scala, , , - . .

+4
1

, . flatMapValues ​​

val DataRDD = sc.parallelize(Array((1, Array("a", "b", "c")), (2, Array("d", "e")),(3, Array("a", "e", "f"))))

DataRDD.flatMapValues(x => x).collect

Array((1,a), (1,b), (1,c), (2,d), (2,e), (3,a), (3,e), (3,f))
+12

All Articles