How to auto-vectorize string records using GCC?

When compiling with GCC 5.2 using -std=c99 , -O3 and -mavx2 , the following is an example of auto-vectorization code ( build here ):

 #include <stdint.h> void test(uint32_t *restrict a, uint32_t *restrict b) { uint32_t *a_aligned = __builtin_assume_aligned(a, 32); uint32_t *b_aligned = __builtin_assume_aligned(b, 32); for (int i = 0; i < (1L << 10); i += 2) { a_aligned[i] = 42 * b_aligned[i]; a_aligned[i+1] = 3 * a_aligned[i+1]; } } 

But the following code example does not autoinject ( build here ):

 #include <stdint.h> void test(uint32_t *restrict a, uint32_t *restrict b) { uint32_t *a_aligned = __builtin_assume_aligned(a, 32); uint32_t *b_aligned = __builtin_assume_aligned(b, 32); for (int i = 0; i < (1L << 10); i += 2) { a_aligned[i] = 42 * b_aligned[i]; a_aligned[i+1] = a_aligned[i+1]; } } 

The only difference between the samples is the scaling factor to a_aligned[i+1] .

This also applies to GCC 4.8, 4.9 and 5.1. Adding a volatile declaration to a_aligned completely prohibits auto-integration. The first sample works sequentially faster than the second for us, with more pronounced acceleration for smaller types (for example, uint8_t instead of uint32_t ).

Is there a way for the second code sample to be auto-vectorized using GCC?

+6
source share
1 answer

The next version is vecturing, but this one is ugly if you ask me ...

 #include <stdint.h> void test(uint32_t *a, uint32_t *aa, uint32_t *restrict b) { #pragma omp simd aligned(a,aa,b:32) for (int i = 0; i < (1L << 10); i += 2) { a[i] = 2 * b[i]; a[i+1] = aa[i+1]; } } 

Compile with -fopenmp and call with test(a, a, b) .

+1
source

All Articles