How does Spark sorting work?

From https://0x0fff.com/spark-architecture-shuffle/ I know that by default the shuffeling method in Spark is random sorting. However, the description was not phased to be clear to me. How it works?

What I understand is that each cartographer writes exactly one AppendOnlyMap("What are the keys?"), Which is sorted (and spilled - why spilled?) Into potentially multiple ... what exactly? ... then somehow written in some indexed (what exactly is indexed using which key?) file. I think in the end the idea is that all of these sorted and indexed files are listed with this Min Heap Merge association to have only one large file per shorthand.

As you can see - there are more whole (things that I do not understand) than Swiss cheese (things that I understand) ...

+4
source share

All Articles