Parallelize a nested loop to symmetry everything - just a comparison with C ++ / OpenMP

I have a simple problem comparing all elements with each other. The comparison itself is symmetrical, so it does not need to be done twice.

The following code example shows what I'm looking for, showing the indices of the available elements:

int n = 5; for (int i = 0; i < n; i++) { for (int j = i + 1; j < n; j++) { printf("%d %d\n", i,j); } } 

Output:

 0 1 0 2 0 3 0 4 1 2 1 3 1 4 2 3 2 4 3 4 

So, each element is compared with each other once. When I want to parallelize this code, I have a problem that first of all I have to stick to dynamic planning, because the calculation time of each iteration varies greatly. And I cannot use collapse due to the fact that nested iterations are indices, dependent on the outer loop.

Using #pragma omp parallel for schedule(dynamic, 3) for the outer loop can lead to single-core executions at the end, while using this for the inner loop can lead to such execution in every iteration of the outer loop.

Is there a more complicated way to do this / parallelize it?

+6
source share
3 answers

I did not think about it completely, but you can try this approach:

 int total = n * (n-1) / 2; // total number of combinations #pragma omp parallel for for (int k = 0; k < total; ++k) { int i = first(k, n); int j = second(k, n, i); printf("%d %d\n", i,j); } int first(int k, int n) { int i = 0; for (; k >= n - 1; ++i) { k -= n - 1; n -= 1; } return i; } int second(int k, int n, int i) { int t = i * (2*n - i - 1) / 2; return (t == 0 ? k + i + 1 : (k % t) + i + 1); } 
+2
source

Indeed, the OpenMP standard says about the collapse that:

An iteration counter for each associated loop is computed before entering the outermost loop. If the execution of any associated loop changes any of the values ​​used to calculate any number of iterations, then the behavior is unspecified.

Thus, you cannot roll your loops, which would be the easiest way. However, since you are not particularly interested in the procedure for calculating index pairs, you can slightly modify your loops as follows:

 for ( int i = 0; i < n; i++ ) { for ( int j = 0; j < n / 2; j++ ) { int ii, jj; if ( j < i ) { ii = n - 1 - i; jj = n - 1 - j; } else { ii = i; jj = j + 1; } printf( "%d %d\n", ii, jj ); } } 

This should give you all the pairs you want, in a slightly distorted order, but with fixed iteration constraints that allow balanced parallelization and even loop folding if you want. Simply, if n is even, the column corresponding to n / 2 will be displayed twice, so you live with it or you slightly modify the algorithm to avoid this ...

0
source

I used to have good results with the following:

 #pragma omp parallel for collapse(2) for (int i = 0; i < n; ++i) { for (int j = 0; j < n; ++j) { if (j <= i) continue; printf("%d %d\n", i, j); } } 

Remember that printf does not perform a parallel workload, so it would be better if you profiled it in your specific work. You can try adding schedule(dynamic, 10) or something larger than 10 , depending on how many iterations you perform.

0
source

All Articles