Atomic Minimum on x86 using OpenMP

Does OpenMP support atomic minimum for C ++ 11? If OpenMP doesn't have a portable method: is there a way to do this using the x86 or amd64 function?

In the OpenMP specs, I didn't find anything for C ++, but the Fortran version seems to support it. See 2.8.5 v3.1 for details. For C ++, it points

binop is one of +, *, -, /, &, ^, |, <, or โ†’.

but for Fortran he points

intrinsic_procedure_name is one of MAX, MIN, IAND, IOR or IEOR.

If you are interested in more context: I'm looking for a way without mutex to do the following:

vector<omp_lock_t>lock; vector<int>val; #pragma omp parallel { // ... int x = ...; int y = ...; if(y < val[x]){ omp_set_lock(&lock[x]); if(y < val[x]) val[x] = y; omp_unset_lock(&lock[x]); } } 

I know that you can calculate the minimum using the reduction algorithm. I know that there are circumstances when this greatly exceeds any approach to the atomic minimum. However, I also know that this is not the case in my situation.

EDIT: one of the options, which in my case is a bit faster:

  int x = ...; int y = ...; while(y < val[x]) val[x] = y; 

but this is not an atomic operation.

All new GPUs have this feature, and I skip it on the CPU. (See Atom_min for OpenCL.)

+7
source share
1 answer

The OpenMP specification for C ++ does not support a minimum number of atoms. Also not C ++ 11.

I assume that in your algorithm, x can calculate any valid index, regardless of the thread. I would suggest changing the algorithm so that each thread uses its own val array and then does the final matching at the end, which can also be parallelized by index. This completely eliminates blocking and atomization and gives you the advantage of sharing data for each thread, that is, there is no way to share false caches. In other words, it should be faster.

+4
source

All Articles