Unusual dead end in MPI_Allgather

After a lot of Google, I have no idea what causes this problem. Here he is:

I have a simple call to MPI_Allgather in my code, which I checked double, triple and four times (the correct sizes of send / receive buffers, the correct sizes of send / receive in a call), but for a "large" number of processes I get either a dead end or MPI_ERR_TRUNCATE . The communicator used for Allgather is shared with MPI_COMM_WORLD using MPI_Comm_split. For my current testing, rank 0 goes to one communicator, and the remaining bits go to the second communicator. For 6 complete rows or less, Allgather works just fine. If I use 7 ranks, I get MPI_ERR_TRUNCATE. 8 rows, dead end. I checked that the communicators were separated correctly (MPI_Comm_rank and MPI_Comm_size are true for all ranks for both Comms).

I manually checked the size of each send and receive buffer and the maximum number received. My first solution was to exchange MPI_Allgather for the MPI_Gather loop for each process. This worked for one case, but the grid change provided to my code (CFD meshes separated by METIS) again caused a problem. Now my solution, which I have not yet been able to break, is to replace Allgather with Allgatherv, which I believe is more efficient, since I have a different amount of data sent from each process.

Here (I hope) the appropriate violation code in context; if I missed something, the Allgather question is on line 599 of this file .

// Get the number of mpiFaces on each processor (for later communication) // 'nProgGrid' is the size of the communicator 'gridComm' vector<int> nMpiFaces_proc(nProcGrid); // This MPI_Allgather works just fine, every time // int nMpiFaces is assigned on preceding lines MPI_Allgather(&nMpiFaces,1,MPI_INT,nMpiFaces_proc.data(),1,MPI_INT,gridComm); int maxNodesPerFace = (nDims==2) ? 2 : 4; int maxNMpiFaces = getMax(nMpiFaces_proc); // The matrix class is just a fancy wrapper around std::vector that // allows for (i,j) indexing. The getSize() and getData() methods just // call the size() and data() methods, respectively, of the underlying // vector<int> object. matrix<int> mpiFaceNodes_proc(nProcGrid,maxNMpiFaces*maxNodesPerFace); // This is the MPI_Allgather which (sometimes) doesn't work. // vector<int> mpiFaceNodes is assigned in preceding lines MPI_Allgather(mpiFaceNodes.data(),mpiFaceNodes.size(),MPI_INT, mpiFaceNodes_proc.getData(),maxNMpiFaces*maxNodesPerFace, MPI_INT,gridComm); 

I am currently using OpenMPI 1.6.4, g ++ 4.9.2 and an 8-core AMD FX-8350 processor with 16 GB of RAM, updating the latest versions of Elementary OS Freya 0.3 (mainly Ubuntu 14.04). However, I also had this problem on another computer using CentOS, Intel hardware, and MPICH2.

Any ideas? I heard that it would be possible to resize the internal MPI buffer to fix such problems, but try to do it quickly (as shown in http://www.caps.ou.edu/pipermail/arpssupport/2002-May/000361.html ) has no effect.

For reference, this problem is very similar to that given here: https://software.intel.com/en-us/forums/topic/285074 , except that in my case I only have 1 processor with 8 cores, on one desktop computer.

UPDATE I was able to compile a minimal example of this failure:

 #include <iostream> #include <vector> #include <stdlib.h> #include <time.h> #include "mpi.h" using namespace std; int main(int argc, char* argv[]) { MPI_Init(&argc,&argv); int rank, nproc, newID, newRank, newSize; MPI_Comm newComm; MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&nproc); newID = rank%2; MPI_Comm_split(MPI_COMM_WORLD,newID,rank,&newComm); MPI_Comm_rank(newComm,&newRank); MPI_Comm_size(newComm,&newSize); srand(time(NULL)); // Get a different 'random' number for each rank on newComm //int nSend = rand()%10000; //for (int i=0; i<newRank; i++) nSend = rand()%10000; /*! -- Found a set of # which fail for nproc=8: -- */ int badSizes[4] = {2695,7045,4256,8745}; int nSend = badSizes[newRank]; cout << "Comm " << newID << ", rank " << newRank << ": nSend = " << nSend << endl; vector<int> send(nSend); for (int i=0; i<nSend; i++) send[i] = rand(); vector<int> nRecv(newSize); MPI_Allgather(&nSend,1,MPI_INT,nRecv.data(),1,MPI_INT,newComm); int maxNRecv = 0; for (int i=0; i<newSize; i++) maxNRecv = max(maxNRecv,nRecv[i]); vector<int> recv(newSize*maxNRecv); MPI_Barrier(MPI_COMM_WORLD); cout << "rank " << rank << ": Allgather-ing data for communicator " << newID << endl; MPI_Allgather(send.data(),nSend,MPI_INT,recv.data(),maxNRecv,MPI_INT,newComm); cout << "rank " << rank << ": Done Allgathering-data for communicator " << newID << endl; MPI_Finalize(); return 0; } 

The above code has been compiled and run as:

 mpicxx -std=c++11 mpiTest.cpp -o mpitest mpirun -np 8 ./mpitest 

with the following output on my 16-core CentOS and 8-core Ubuntu machines:

 Comm 0, rank 0: nSend = 2695 Comm 1, rank 0: nSend = 2695 Comm 0, rank 1: nSend = 7045 Comm 1, rank 1: nSend = 7045 Comm 0, rank 2: nSend = 4256 Comm 1, rank 2: nSend = 4256 Comm 0, rank 3: nSend = 8745 Comm 1, rank 3: nSend = 8745 rank 5: Allgather-ing data for communicator 1 rank 6: Allgather-ing data for communicator 0 rank 7: Allgather-ing data for communicator 1 rank 0: Allgather-ing data for communicator 0 rank 1: Allgather-ing data for communicator 1 rank 2: Allgather-ing data for communicator 0 rank 3: Allgather-ing data for communicator 1 rank 4: Allgather-ing data for communicator 0 rank 5: Done Allgathering-data for communicator 1 rank 3: Done Allgathering-data for communicator 1 rank 4: Done Allgathering-data for communicator 0 rank 2: Done Allgathering-data for communicator 0 

Please note that only 2 bits from each communicator exit Allgather; this is not what happens in my actual code (no rows on the β€œbroken” communicator exit Allgather), but the end result is the same - the code freezes until I kill it.

I suppose this has something to do with the different number of shipments for each process, but as far as I can tell from the MPI documentation and tutorials I saw, this should be allowed, right? Of course, MPI_Allgatherv is a little more applicable, but for reasons of simplicity, I use Allgather instead.

+6
source share
1 answer

You should use MPI_Allgatherv if the amount of input is not the same for all processes.

To be precise, what should coincide is a signature of type count,type , since technically you can go to the same fundamental representation with different data types (for example, N elements vs 1-element, which is a continuous type of N elements ), but if you use the same argument everywhere, which is a common use of MPI collectives, then your calculations should match everywhere.

The relevant part of the latest MPI standard (3.1) is given on page 165:

The type signature associated with sendcount, sendtype in the process must be equal to the type signature associated with recvcount, recvtype in any other process.

+4
source

All Articles