MPI Fortran code: how to share data through node through openMP?

I am working on Fortan code that already uses MPI.

Now I am faced with a situation when the data set grows very large, but the same for each process, so I would prefer to store it in memory only once per node and all processes on the same node access the same data.

Storing it once for each process goes beyond the available RAM.

How can I achieve something like this with openMP?

Sharing data on a node is the only thing I would like to have, no other need for parallelization of a node, because it has already been done through MPI.

+7
memory-management memory fortran openmp mpi
source share
3 answers

You do not need to implement hybrid MPI + OpenMP code if it is intended only for the exchange of a piece of data. What you really need to do:

1) Divide the world communicator into groups that span the same host / node. It is very simple if your MPI library implements MPI-3.0 - all you have to do is call MPI_COMM_SPLIT_TYPE with split_type set to MPI_COMM_TYPE_SHARED :

 USE mpi_f08 TYPE(MPI_Comm) :: hostcomm CALL MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, & MPI_INFO_NULL, hostcomm) 

MPI-2.2 or earlier does not provide the MPI_COMM_SPLIT_TYPE operation, and you need to do a few creative ones. For example, you can use my simple implementation separately, which can be found on Github here .

2) Now that the processes that are on the same node are part of the same hostcomm communicator, they can create a shared memory block and use it to exchange data. Again, MPI-3.0 provides a (relatively) simple and portable way to do this:

 USE mpi_f08 USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, C_F_POINTER INTEGER :: hostrank INTEGER(KIND=MPI_ADDRESS_KIND) :: size INTEGER :: disp_unit TYPE(C_PTR) :: baseptr TYPE(MPI_Win) :: win TYPE(MY_DATA_TYPE), POINTER :: shared_data ! We only want one process per host to allocate memory ! Set size to 0 in all processes but one CALL MPI_Comm_rank(hostcomm, hostrank) if (hostrank == 0) then size = 10000000 ! Put the actual data size here else size = 0 end if disp_unit = 1 CALL MPI_Win_allocate_shared(size, disp_unit, MPI_INFO_NULL, & hostcomm, baseptr, win) ! Obtain the location of the memory segment if (hostrank /= 0) then CALL MPI_Win_shared_query(win, 0, size, disp_unit, baseptr) end if ! baseptr can now be associated with a Fortran pointer ! and thus used to access the shared data CALL C_F_POINTER(baseptr, shared_data) ! Use shared_data as if it was ALLOCATE'd ! ... ! Destroy the shared memory window CALL MPI_Win_free(win) 

How this code works, it uses MPI-3.0 functions to distribute windows of shared memory. MPI_WIN_ALLOCATE_SHARED allocates a piece of shared memory in each process. Since you want to split one block of data, it makes sense to select it in only one process and not distribute it to all processes, so size set to 0 for all but one, during the call. MPI_WIN_SHARED_QUERY used to determine the address where this shared memory block is mapped into the virtual address space of the calling process. One of the addresses is known, the C pointer can be associated with the Fortran pointer using the C_F_POINTER() routine, and the latter can be used to access shared memory. After that, shared memory should be freed by destroying the shared memory window using MPI_WIN_FREE .

MPI-2.2 or earlier does not provide shared memory windows. In this case, it is necessary to use OS-dependent APIs to create shared memory blocks, for example. POSIX standard sequence shm_open() / ftruncate() / mmap() . To perform these operations, you must write a C function called from Fortran. See this code for some inspiration. void * returned by mmap() can be passed directly to Fortran code in a variable of type C_PTR , which can then be associated with a Fortran pointer.

+11
source share

With this answer, I want to add an example of the full startup code (for ifort 15 and mvapich 2.1). The concept of shared MPI memory is still fairly new, and Fortran in particular has few code samples. It is based on a response from Hristo and a very useful email on the mvapich mailing list ( http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2014-June/005003.html ).

The sample code is based on the problems I encountered, and adds the answer to Hristo in the following ways:

  • uses mpi instead of mpi_f08 (some libraries do not yet provide the full fortran 2008 interface)
  • Did ierr add corresponding MPI calls
  • Explicit calculation of windowsize * elementsize elements
  • How to use C_F_POINTER to map shared memory to a multidimensional array
  • Reminds you to use MPI_WIN_FENCE after changing shared memory
  • Intel MPI (5.0.1.035) needs an additional MPI_BARRIER after MPI_FENCE, since it guarantees only "between two calls to MPI_Win_fence, all RMA operations are complete." ( https://software.intel.com/en-us/blogs/2014/08/06/one-sided-communication )

Kudos goes to Christo and Michael Rachner.

 program sharedmemtest USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, C_F_POINTER use mpi implicit none integer, parameter :: dp = selected_real_kind(14,200) integer :: win,win2,hostcomm,hostrank INTEGER(KIND=MPI_ADDRESS_KIND) :: windowsize INTEGER :: disp_unit,my_rank,ierr,total TYPE(C_PTR) :: baseptr,baseptr2 real(dp), POINTER :: matrix_elementsy(:,:,:,:) integer,allocatable :: arrayshape(:) call MPI_INIT( ierr ) call MPI_COMM_RANK(MPI_COMM_WORLD,MY_RANK,IERR) !GET THE RANK OF ONE PROCESS call MPI_COMM_SIZE(MPI_COMM_WORLD,Total,IERR) !GET THE TOTAL PROCESSES OF THE COMM CALL MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, hostcomm,ierr) CALL MPI_Comm_rank(hostcomm, hostrank,ierr) ! Gratefully based on: http://stackoverflow.com/questions/24797298/mpi-fortran-code-how-to-share-data-on-node-via-openmp ! and https://gcc.gnu.org/onlinedocs/gfortran/C_005fF_005fPOINTER.html ! We only want one process per host to allocate memory ! Set size to 0 in all processes but one allocate(arrayshape(4)) arrayshape=(/ 10,10,10,10 /) if (hostrank == 0) then windowsize = int(10**4,MPI_ADDRESS_KIND)*8_MPI_ADDRESS_KIND !*8 for double ! Put the actual data size here else windowsize = 0_MPI_ADDRESS_KIND end if disp_unit = 1 CALL MPI_Win_allocate_shared(windowsize, disp_unit, MPI_INFO_NULL, hostcomm, baseptr, win, ierr) ! Obtain the location of the memory segment if (hostrank /= 0) then CALL MPI_Win_shared_query(win, 0, windowsize, disp_unit, baseptr, ierr) end if ! baseptr can now be associated with a Fortran pointer ! and thus used to access the shared data CALL C_F_POINTER(baseptr, matrix_elementsy,arrayshape) !!! your code here! !!! sample below if (hostrank == 0) then matrix_elementsy=0.0_dp matrix_elementsy(1,2,3,4)=1.0_dp end if CALL MPI_WIN_FENCE(0, win, ierr) print *,"my_rank=",my_rank,matrix_elementsy(1,2,3,4),matrix_elementsy(1,2,3,5) !!! end sample code call MPI_WIN_FENCE(0, win, ierr) call MPI_BARRIER(MPI_COMM_WORLD,ierr) call MPI_Win_free(win,ierr) call MPI_FINALIZE(IERR) end program 
+5
source share

In the spirit of adding Fortran shared memory MPI examples, I would like to extend the ftiaronsem code to include a loop so that the behavior of MPI_Win_fence and MPI_Barrier is clearer (at least for me now, anyway).

In particular, try running the code with one or both of the MPI_Win_Fence or MPI_Barrier commands in a loop commented out to see the effect. Alternatively, reorder.

Removing MPI_Win_Fence allows the recording operator to display memory that has not yet been updated.

Removing MPI_Barrier allows other processes to start the next iteration and change memory before the process can write.

The previous answers really helped me implement the shared memory paradigm in my MPI code. Thanks.

 program sharedmemtest USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, C_F_POINTER use mpi implicit none integer, parameter :: dp = selected_real_kind(14,200) integer :: win,win2,hostcomm,hostrank INTEGER(KIND=MPI_ADDRESS_KIND) :: windowsize INTEGER :: disp_unit,my_rank,ierr,total, i TYPE(C_PTR) :: baseptr,baseptr2 real(dp), POINTER :: matrix_elementsy(:,:,:,:) integer,allocatable :: arrayshape(:) call MPI_INIT( ierr ) call MPI_COMM_RANK(MPI_COMM_WORLD,my_rank, ierr) !GET THE RANK OF ONE PROCESS call MPI_COMM_SIZE(MPI_COMM_WORLD,total,ierr) !GET THE TOTAL PROCESSES OF THE COMM CALL MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, hostcomm,ierr) CALL MPI_Comm_rank(hostcomm, hostrank,ierr) ! Gratefully based on: http://stackoverflow.com/questions/24797298/mpi-fortran-code-how-to-share-data-on-node-via-openmp ! and https://gcc.gnu.org/onlinedocs/gfortran/C_005fF_005fPOINTER.html ! We only want one process per host to allocate memory ! Set size to 0 in all processes but one allocate(arrayshape(4)) arrayshape=(/ 10,10,10,10 /) if (hostrank == 0) then windowsize = int(10**4,MPI_ADDRESS_KIND)*8_MPI_ADDRESS_KIND !*8 for double ! Put the actual data size here else windowsize = 0_MPI_ADDRESS_KIND end if disp_unit = 1 CALL MPI_Win_allocate_shared(windowsize, disp_unit, MPI_INFO_NULL, hostcomm, baseptr, win, ierr) ! Obtain the location of the memory segment if (hostrank /= 0) then CALL MPI_Win_shared_query(win, 0, windowsize, disp_unit, baseptr, ierr) end if ! baseptr can now be associated with a Fortran pointer ! and thus used to access the shared data CALL C_F_POINTER(baseptr, matrix_elementsy,arrayshape) !!! your code here! !!! sample below if (hostrank == 0) then matrix_elementsy=0.0_dp endif call MPI_WIN_FENCE(0, win, ierr) do i=1, 15 if (hostrank == 0) then matrix_elementsy(1,2,3,4)=i * 1.0_dp matrix_elementsy(1,2,2,4)=i * 2.0_dp elseif ((hostrank > 5) .and. (hostrank < 11)) then ! code for non-root nodes to do something different matrix_elementsy(1,2,hostrank, 4) = hostrank * 1.0 * i endif call MPI_WIN_FENCE(0, win, ierr) write(*,'(A, I4, I4, 10F7.1)') "my_rank=",my_rank, i, matrix_elementsy(1,2,:,4) call MPI_BARRIER(MPI_COMM_WORLD, ierr) enddo !!! end sample code call MPI_WIN_FENCE(0, win, ierr) call MPI_BARRIER(MPI_COMM_WORLD,ierr) call MPI_Win_free(win,ierr) call MPI_FINALIZE(IERR) end program 
+1
source share

All Articles