Automatic vectorization does not work

I am trying to get my code for automatic vectorization, but it does not work.

int _tmain(int argc, _TCHAR* argv[]) { const int N = 4096; float x[N]; float y[N]; float sum = 0; //create random values for x and y for (int i = 0; i < N; i++) { x[i] = rand() >> 1; y[i] = rand() >> 1; } for (int i = 0; i < N; i++){ sum += x[i] * y[i]; } } 

No cycle is drawn here, but I am only interested in the second cycle.

I am using visual studio express 2013 and compiling with /O2 and /Qvec-report:2 (in order to tell if the loop has been vectorized). When I compile, I get the following message:

 --- Analyzing function: main c:\users\...\documents\visual studio 2013\projects\intrin3\intrin3\intrin3.cpp(28) : info C5002: loop not vectorized due to reason '1200' c:\users\...\documents\visual studio 2013\projects\intrin3\intrin3\intrin3.cpp(41) : info C5002: loop not vectorized due to reason '1305' 

Reason '1305', as can be seen HERE , says that "the compiler cannot recognize the correct vectorization information for this loop." I'm not quite sure what that means. Any ideas?

After dividing the second cycle into two loops:

 for (int i = 0; i < N; i++){ sumarray[i] = x[i] * y[i]; } for (int i = 0; i < N; i++){ sum += sumarray[i]; } 

Now the first of the above loops vectorizes, and the second does not, again with error code 1305.

+7
c ++ optimization vectorization sse simd
source share
2 answers

Error 1305 occurs because the optimizer did not vectorize the loop because the sum value is not used. Just adding printf("%d\n", sum) fixes this. But then you get a new error code 1105 "Loop includes an unrecognized reduction operation." In order to fix this, you need to install https://stackoverflow.com/a/330947/

The reason is that floating point arithmetic is not associative, and abbreviations using SIMD or MIMD (i.e. using multiple threads) should be associative. Using a looser floating point model, you can do the reduction.

I just tested it with the following code, and by default fp:precise does not vectorize, and when I use fp:fast it does.

 #include <stdio.h> int main() { const int N = 4096; float x[N]; float y[N]; float sum = 0; for (int i = 0; i < N; i++){ sum += x[i] * y[i]; } printf("sum %f\n", sum); } 

As for your question about the loop with the rand() function, the rand() function is not a SIMD function. It cannot be vectorized. You need to find the SIMD rand () function. I do not know one thing. An alternative is to pre-compute an array of random numbers and use an array instead. In any case, rand() creates a terrible random number and is only useful for some toy cases. Consider using Merrenne twister PRNG.

+7
source share

One problem may be that your stack distribution is not necessarily aligned by your compiler. If your compiler supports C ++ 11, you can use:

 float x[N] alignas(16); float y[N] alignas(16); 

To explicitly get 16-byte aligned memory, which is required by most SSE operations.


EDIT:

Even if alignment is not a problem, and your compiler is to vectorize non-aligned code, you should do this optimization, since unordered SSE operations are very slow compared to their matched counterparts.

+1
source share

All Articles