Is there any document on how MPI functions are implemented, such as MPI_Algather, MPI_AlltoAll, MPI_Allreduce, etc.?
I would like to learn about their algorithm and calculate their complexity with respect to unidirectional or bidirectional bandwidth and the total data transfer size for multiple nodes and a fixed data size.
source share