I am trying to extract 4 bytes from a 128 bit register in an efficient way. The problem is that each value is in a 32-bit 32-bit {120,0,0,0,55,0,0,0,42,0,0,0,120,0,0,0} . I want to convert 128 bits to 32 bits into the form {120,55,42,120} .
The "raw" code is as follows:
__m128i byte_result_vec={120,0,0,0,55,0,0,0,42,0,0,0,120,0,0,0}; unsigned char * byte_result_array=(unsigned char*)&byte_result_vec; result_array[x]=byte_result_array[0]; result_array[x+1]=byte_result_array[4]; result_array[x+2]=byte_result_array[8]; result_array[x+3]=byte_result_array[12];
My SSSE3 Code:
unsigned int * byte_result_array=...; __m128i byte_result_vec={120,0,0,0,55,0,0,0,42,0,0,0,120,0,0,0}; const __m128i eight_bit_shuffle_mask=_mm_set_epi8(1,1,1,1,1,1,1,1,1,1,1,1,0,4,8,12); byte_result_vec=_mm_shuffle_epi8(byte_result_vec,eight_bit_shuffle_mask); unsigned int * byte_result_array=(unsigned int*)&byte_result_vec; result_array[x]=byte_result_array[0];
How can I do this efficiently using SSE2. Is there a better version with SSSE3 or SSE4?