Align and align the loading and storage of SSE vectors - how to reduce code duplication?

Question

Align and align the loading and storage of SSE vectors - how to reduce code duplication?

Often I have to write two implementations of a function that used SSE instructions, because input and output buffers can have either aligned or not aligned addresses:

void some_function_aligned(const float * src, size_t size, float * dst) { for(size_t i = 0; i < size; i += 4) { __m128 a = _mm_load_ps(src + i); // do something... _mm_store_ps(dst + i, a); } }

and

 void some_function_unaligned(const float * src, size_t size, float * dst) { for(size_t i = 0; i < size; i += 4) { __m128 a = _mm_loadu_ps(src + i); // do something... _mm_storeu_ps(dst + i, a); } }

And the question arises: how to reduce code duplication, because these functions are almost equal?

+5

c ++ sse simd

user4792273 Apr 15 '15 at 15:37

source share

1 answer

Ermig · Accepted Answer · 2015-04-15T15:40:39+0000

There is a solution to this problem that is widely used here ( http://simd.sourceforge.net/ ). It is based on the specialization of template functions for loading and saving SSE vectors:

 template <bool align> __m128 load(const float * p); template <> inline __m128 load<false>(const float * p) { return _mm_loadu_ps(p); } template <> inline __m128 load<true>(const float * p) { return _mm_load_ps(p); } template <bool align> void store(float * p, __m128 a); template <> inline void Store<false>(float * p, __m128 a) { _mm_storeu_ps(p, a); } template <> inline void Store<true>(float * p, __m128 a) { _mm_store_ps(p, a); }

And now we can write only one implementation of the template function:

 template <bool align> void some_function(const float * src, size_t size, float * dst) { for(size_t i = 0; i < size; i += 4) { __m128 a = load<align>(src + i); // do something... store<align>(dst + i, a); } }

Align and align the loading and storage of SSE vectors - how to reduce code duplication?

More articles: