GCC vector extensions offer a nice, reasonably portable way to access some SIMD instructions on different hardware architectures without resorting to hardware specific properties (or auto-injection).
Real use case, calculates a simple additive checksum. The only thing that is unclear is the safe loading of data into the vector.
typedef char v16qi __attribute__ ((vector_size(16))); static uint8_t checksum(uint8_t *buf, size_t size) { assert(size%16 == 0); uint8_t sum = 0; vec16qi vec = {0}; for (size_t i=0; i<(size/16); i++) { // XXX: Yuck! Is there a better way? vec += *((v16qi*) buf+i*16); } // Sum up the vector sum = vec[0] + vec[1] + vec[2] + vec[3] + vec[4] + vec[5] + vec[6] + vec[7] + vec[8] + vec[9] + vec[10] + vec[11] + vec[12] + vec[13] + vec[14] + vec[15]; return sum; }
Pressing a pointer to a vector type seems to work, but I'm worried that it could explode in a horrible way if the SIMD hardware expects the vector types to align correctly.
The only other option that I was thinking about is to use a temporary vector and explicitly load values (via memcpy assignment or by element), but when testing this counteraction, the use of SIMD instructions got most of the acceleration. Ideally, I would suggest that it would be something like the generic __builtin_load() function, but it doesn't seem to exist.
What is a safer way to load data into a vector, at risk of alignment problems?
gcc vectorization simd checksum
dcoles
source share