This is almost the same as the pipeline code.
The first is obvious what will happen when you seem clear, however the chain will lead to the following (simplified):
MapPartitionsRDD( MapPartitionsRDD( MapPartitionsRDD( rdd, iter.map(g)), iter.map(f)), iter.map(h))
Simplification of further visualization:
map(map(map(rdd,g),f),h)
What upon execution comes down to:
h(f(g(rddItem)))
Sounds familiar? All this is a chain of pipeline computing ... brought to you by the joys of lazy appreciation.
This can be seen in the example:
def f(x: Int) = {println(s"f$x");x} def g(x: Int) = {println(s"g$x");x} def h(x: Int) = {println(s"h$x");x} val rdd = sc.makeRDD(1 to 3, 1) rdd.map(x => h(f(g(x)))) g1 f1 h1 g2 f2 h2 g3 f3 h3 rdd.map(g).map(f).map(h) g1 f1 h1 g2 f2 h2 g3 f3 h3
source share