Byte-pixel processing using SSE / SSE2 properties in C

I am programming for the cross-platform C library to do various things for webcam images. All operations are pixels and are very parallelizable - for example, they apply bit masks, multiplying color values ​​by constants, etc. Therefore, I think I can get performance using the built-in SSE / SSE2 features.

However, I have a problem with the data format. My webcam library gives me webcam frames as a pointer (void *) to a buffer containing 24- or 32-bit byte pixels in ABGR or BGR format. I passed them to char *, so ptr ++ etc. Behaves correctly. However, all SSE / SSE2 operations expect either four integers or four floats in the __m128 or __m64 data types. If I do this (if I read the color values ​​from the buffer into the characters r, g and b):

float pixel [] = {(float) r, (float) g, {float) b, 0.0f};

then load another floating point array, full constants

constants float [] = {0.299, 0.587, 0.114, 0.0f};

discard both floating-point pointers to __m128 and use __mm_mul_ps to execute r * 0.299, g * 0.587, etc. etc., there is no overall performance gain, because all shuffled things take so long!

Does anyone have any suggestions on how to quickly and efficiently load these byte pixel values ​​into SSE registers so that I can get a performance boost from working with them as such?

+5
source share
3 answers

If you want to use MMX ...

MMX gives you a bunch of 64-bit registers that can treat each register as 8, 8-bit values.

Like the 8-bit values ​​you work with.

.

+1

, , .

, 50 ... , FP, , 4 , , 1 15 , .

( ), MMX, , .

+1

-, , ( , void*) - .

-, SSE2, , - - ( , ).

, - unsigned char SSE2 ( , R, G B 0 255), , , .

But if you want to make it cross-platform, I suppose using intrinsics will be cleaner.

Good luck

0
source

All Articles