When compiling with GCC 5.2 using -std=c99 , -O3 and -mavx2 , the following is an example of auto-vectorization code ( build here ):
But the following code example does not autoinject ( build here ):
The only difference between the samples is the scaling factor to a_aligned[i+1] .
This also applies to GCC 4.8, 4.9 and 5.1. Adding a volatile declaration to a_aligned completely prohibits auto-integration. The first sample works sequentially faster than the second for us, with more pronounced acceleration for smaller types (for example, uint8_t instead of uint32_t ).
Is there a way for the second code sample to be auto-vectorized using GCC?
source share