Running a thread for each inner loop in OpenMP

Question

Running a thread for each inner loop in OpenMP

I'm new to OpenMP, and I'm trying to run a separate thread to process each element in a 2D array.

So essentially this is:

for (i = 0; i < dimension; i++) { for (int j = 0; j < dimension; j++) { a[i][j] = b[i][j] + c[i][j];

What I am doing is:

 #pragma omp parallel for shared(a,b,c) private(i,j) reduction(+:diff) schedule(dynamic) for (i = 0; i < dimension; i++) { for (int j = 0; j < dimension; j++) { a[i][j] = b[i][j] + c[i][j];

Is this the beginning of the stream for each 2D object or not? How can I check this? If this is wrong, what is the right way to do this? Thanks!

Note: the code has been greatly simplified.

+6

c arrays loops parallel-processing openmp

achinda99 Feb 07 '10 at 3:37

source share

2 answers

You can try using nested omp parallel pairs (after calling omp_set_nested(1) ), but they are not supported in all openmp implementations.

So, I suppose to make some 2D grid and start the whole stream on the grid from one for (example for a fixed 4x4 stream grid):

 #pragma omp parallel for for(k = 0; k < 16; k++) { int i,j,i_min,j_min,i_max,j_max; i_min=(k/4) * (dimension/4); i_max=(k/4 + 1) * (dimension/4); j_min=(k%4) * (dimension/4); j_max=(k%4 + 1) * (dimension/4); for(i=i_min;i<i_max;i++) for(j=j_min;j<j_max;j++) f(i,j); }

0

osgx Feb 07 '10 at 5:01

source share

Ramashalanka · Accepted Answer · 2010-02-07T04:00:41+0000

In your code example, only the outer loop is parallel. You can test by typing omp_get_thread_num() in the inner loop, and you will see that for a given i number of threads is the same (of course, this test is demonstrative and not final, since different runs will give different results), For example, with:

 #include <stdio.h> #include <omp.h> #define dimension 4 int main() { #pragma omp parallel for for (int i = 0; i < dimension; i++) for (int j = 0; j < dimension; j++) printf("i=%d, j=%d, thread = %d\n", i, j, omp_get_thread_num()); }

I get:

 i=1, j=0, thread = 1 i=3, j=0, thread = 3 i=2, j=0, thread = 2 i=0, j=0, thread = 0 i=1, j=1, thread = 1 i=3, j=1, thread = 3 i=2, j=1, thread = 2 i=0, j=1, thread = 0 i=1, j=2, thread = 1 i=3, j=2, thread = 3 i=2, j=2, thread = 2 i=0, j=2, thread = 0 i=1, j=3, thread = 1 i=3, j=3, thread = 3 i=2, j=3, thread = 2 i=0, j=3, thread = 0

As for the rest of your code, you might want to add more details to the new question (it's hard to say from a small example), but, for example, you cannot put private(j) when j only announced later. It is automatically closed in my example above. I think diff is a variable that we do not see in the sample. In addition, the loop variable i automatically closed (from spec version 2.5 - the same in specification 3.0)

The iterative loop variable in for-loop for for or parallel for construct is private in this build.

Edit: all of the above is true for the code that you and I showed, but you may be interested in the following. For OpenMP version 3.0 (for example, gcc version 4.4 is available , but not version 4.3) there is a collapse clause where you can write code as you have, but with #pragma omp parallel for collapse (2) for parallelization as for loops (see . specification ).

Edit : OK, I downloaded gcc 4.5.0 and executed the above code, but using collapse (2) to get the following output, showing that the inner loop is now parallelized:

 i=0, j=0, thread = 0 i=0, j=2, thread = 1 i=1, j=0, thread = 2 i=2, j=0, thread = 4 i=0, j=1, thread = 0 i=1, j=2, thread = 3 i=3, j=0, thread = 6 i=2, j=2, thread = 5 i=3, j=2, thread = 7 i=0, j=3, thread = 1 i=1, j=1, thread = 2 i=2, j=1, thread = 4 i=1, j=3, thread = 3 i=3, j=1, thread = 6 i=2, j=3, thread = 5 i=3, j=3, thread = 7

The comments here (searching for "Workarounds") are also relevant to workarounds in version 2.5, if you want to parallelize both loops, but the above specification for version 2.5 is pretty clear (see non-conforming examples in section A.35 ).

Running a thread for each inner loop in OpenMP

More articles: