I am trying to speed up my code using sse and the following code works well. Basically, the __m128 variable should point to 4 floats per line in order to perform 4 operations at once.
This code is equivalent to calculating c[i]=a[i]+b[i] with i from 0 to 3 .
float *data1,*data2,*data3 // ... code ... allocating data1-2-3 which are very long. __m128* a = (__m128*) (data1); __m128* b = (__m128*) (data2); __m128* c = (__m128*) (data3); *c = _mm_add_ps(*a, *b);
However, when I want to slightly move the data that I use (see below) to calculate c[i]=a[i+1]+b[i] with i from 0 to 3 , it pops at runtime.
__m128* a = (__m128*) (data1+1); // <-- +1 __m128* b = (__m128*) (data2); __m128* c = (__m128*) (data3); *c = _mm_add_ps(*a, *b);
I assume that this is due to the fact that __m128 is 128 bits, and according to floating data - 32 bits. Thus, it may not be possible for a 128-bit pointer to specify an address that is not divisible by 128.
In any case, do you know what the problem is and how I can get around this?
c ++ c pointers sse
Oli
source share