How does Spark sorting work?

Question

How does Spark sorting work?

From https://0x0fff.com/spark-architecture-shuffle/ I know that by default the shuffeling method in Spark is random sorting. However, the description was not phased to be clear to me. How it works?

What I understand is that each cartographer writes exactly one AppendOnlyMap("What are the keys?"), Which is sorted (and spilled - why spilled?) Into potentially multiple ... what exactly? ... then somehow written in some indexed (what exactly is indexed using which key?) file. I think in the end the idea is that all of these sorted and indexed files are listed with this Min Heap Merge association to have only one large file per shorthand.

As you can see - there are more whole (things that I do not understand) than Swiss cheese (things that I understand) ...

+4

apache-spark

Make42 May 30 '16 at 22:16

source share

No one has answered this question yet.

See related questions:

170

Apache Spark: the number of cores compared to the number of performers

68

How are stages broken down into tasks in Spark?

8