I am new to OpenMP and I am trying to paralyze the following code using OpenMP:
#pragma omp parallel for for(int k=0;k<m;k++) { for(int j=n-1;j>=0;j--) { outX[k+j*m] = inB2[j+n * k] / inA2[j*n + j]; for(int i=0;i<j;i++) { inB2[k*n+i] -= inA2[i+n * j] * outX[k + m*j]; } } }
Paralyzing the outer loop is pretty straightforward, but to optimize it, I would like to paralyze the loop of the inner loop itself (one iteration through i). But when I try to do it like this:
#pragma omp parallel for for(int i=0;i<j;i++) { inB2[k*n+i] -= inA2[i+n * j] * outX[k + m*j]; }
the compiler does not vectorize the inner loop ("a loop executed for vectorization due to possible aliasing"), which makes it work more slowly. I compiled it with gcc -ffast-math -std=c++11 -fopenmp -O3 -msse2 -funroll-loops -g -fopt-info-vec prog.cpp
Thanks for any advice!
EDIT: I use the __restrict keyword for arrays.
EDIT2: Interestingly, when I save only the pragma in the inner loop and remove it from the outer, gcc will vectorize it. Therefore, the problem occurs when I try to paralyze both cycles.
EDIT3: the compiler will vectorize a loop when I use #pragma omp parallel for simd. But it is still slower than without a parallel inner loop.
c ++ vectorization openmp
Honza dejdar
source share