I have data formatted as follows:
DataRDD = [(String, List[String])]
The key is indicated on the first line, and the list contains values. Note that the number of values is different for each key (but never equal to zero). I am looking for an RDD map in such a way that for each element in the list there will be a pair of keys, values. To clarify this, imagine that the entire RDD is presented in the following list:
DataRDD = [(1, [a, b, c]),
(2, [d, e]),
(3, [a, e, f])]
Then I would like to get the result:
DataKV = [(1, a),
(1, b),
(1, c),
(2, d),
(2, e),
(3, a),
(3, e),
(3, f)]
Therefore, I would like to return all key combinations that have the same value. This can be returned to the list for each key, even if there are no identical values:
DataID = [(1, [3]),
(2, [3]),
(3, [1, 2])]
Spark Scala, , , - . .