I would like to know how collectAsMap works in Spark. In particular, I would like to know where the data will be aggregated from all partitions? Aggregation occurs either at the master or at the workers. In the first case, each employee sends their data to the master and when the master collects data from each one worker, then the master will aggregate the results. In the second case, the workers are responsible for combining the results (after exchanging data between them), after which the results will be sent to the master.
It is very important for me to find a way so that the wizard can collect data from each section separately, without exchanging data between employees.
distributed-computing apache-spark worker
Χρήστος Μάλλιος
source share