I am trying to use vectorization in my compiler (Microsoft Visual Studio 2013). One of the problems that I am facing is that she does not want to use AVX2. In exploring this problem, I built the following example, which calculates the sum of 16 numbers, each of which is 16 bits.
int16_t input1[16] = {0}; int16_t input2[16] = {0}; ...
The compiler vectorizes this code, but only for SSE instructions:
vmovdqu xmm1, xmmword ptr [rbp+rax] lea rax, [rax+10h] vpaddw xmm1, xmm1, xmmword ptr [rbp+rax+10h] vmovdqu xmmword ptr [rbp+rax+30h], xmm1 dec rcx jne main+0b0h
To make sure that the compiler has the ability to generate AVX2 code, I wrote the same calculation as follows:
I see that the two parts of the code are equivalent (i.e., output11 is equal to output2 after they are executed).
And it outputs AVX2 commands for the second part of the code:
vmovdqu ymm1, ymmword ptr [input2] vpaddw ymm1, ymm1, ymmword ptr [rbp] vmovdqu ymmword ptr [output2], ymm1
I do not want to rewrite my code to use intrinsics, however: if it is written as a loop, much more natural, it is compatible with old (SSE-only) processors and has other advantages.
So, how can I customize my example so that the compiler can vectorize it in AVX2 mode?
c ++ c vectorization visual-studio-2013 avx2
anatolyg
source share