When I start work using spark, do I get the following logs?
[Stage 0:> (0 + 32) / 32]
Here 32 corresponds to the number of rdd partitions I requested.
However, I do not understand why there are several stages and what exactly happens at each stage.
Each stage seems to take a lot of time. Is it possible to do this in several stages?
mapreduce apache-spark pyspark apache-spark-sql
Harit vishwakarma
source share