Spark: FlatMapValues ​​request

I am reading the Learning Spark book and cannot understand the following conversion of the rdd pair.

rdd.flatMapValues(x => (x to 5)) 

It applies to rdd {(1,2),(3,4),(3,6)} , and the result of the conversion is {(1,2),(1,3),(1,4),(1,5),(3,4),(3,5)}

Can someone explain this.

+9
source share
2 answers

flatMapValues method is a combination of flatMap and mapValues .

Let's start with this RDD.

 val sampleRDD = sc.parallelize(Array((1,2),(3,4),(3,6))) 

mapValues displays values ​​while retaining keys.

For example, sampleRDD.mapValues(x => x to 5) returns

 Array((1,Range(2, 3, 4, 5)), (3,Range(4, 5)), (3,Range())) 

note that for a key-value pair (3, 6) , (3,Range()) , since 6 to 5 creates an empty collection of values.


flatMap “splits” collections into collection elements. You can find a more accurate description of flatMap online, for example here and here .

For instance,

given val rdd2 = sampleRDD.mapValues(x => x to 5) , if we do rdd2.flatMap(x => x) , you will get

 Array((1,2),(1,3),(1,4),(1,5),(3,4),(3,5)). 

That is, for each element in the collection of each key, we create a pair (key, element) .

Also note that (3, Range()) does not create any additional pair of key elements since the sequence is empty.

now, combining flatMap and mapValues , you get flatMapValues .

+27
source

flatMapValues ​​works with each value associated with the key. In the above case, from x to 5 means that each value will be increased to 5.

Taking the first pair where you have (1,2) , here the key is 1, and the value is 2, so after applying the transformation it will become {(1,2),(1,3),(1,4),(1,5)}

Hope this helps.

+5
source

All Articles