What is a Spark RDD graph, line graph, DAG Spark task? what is their relationship

When we talk about RDD graphs, does this mean a line graph or a DAG (straight acyclic graph) or both? and when is the line graph generated? Is it generated before DAG Spark tasks?

+6
source share
1 answer

RDD may depend on zero or more other RDDs. For example, if you say x = y.map(...) , x will depend on y . These dependency relationships can be viewed as a graph.

You can call this graph a line graph, since it represents the output of each RDD. This is also mandatory DAG, since it is impossible to be a loop in it.

Narrow dependencies when shuffling is not required (I think map and filter ) can be minimized in one step. Stages are a unit of execution, and they are generated by the DAGScheduler from the RDD dependency graph. Stages also depend on each other. DAGScheduler builds and uses this dependency graph (which is also necessarily a DAG) for planning stages.

+7
source

All Articles