I am trying to convert code written in SSE3 intrinsics to NEON SIMD and am stuck due to shuffle function. I looked at GCC Intrinsic , ARM manuals, and other forums, but couldn't find a solution.
CODE:
_m128i upper = _mm_loadu_si128((__m128i*)p1);
register __m128i mask1 = _mm_set_epi8 (0x80,0x80,0x80,0x80,0x80,0x80,0x80,12,0x80,10,0x80,7,0x80,4,0x80,1);
register __m128i mask2 = _mm_set_epi8 (0x80,0x80,0x80,0x80,0x80,0x80,12,0x80,10,0x80,7,0x80,4,0x80,1,0x80);
__m128i temp1_upper = _mm_or_si128(_mm_shuffle_epi8(upper,mask1),_mm_shuffle_epi8(upper,mask2));
Although the vtbl1_u8 command (uint8x8_t, uint8x8_t) creates a lookup table that can be used to assign values to the destination register, it only works with 64-bit registers. Also, the shuffle operation performs a comparison at the beginning, which should be done in NEON, and I do not know how to do this efficiently.
r0 = (mask0 and 0x80)? 0: SELECT (a, mask0 and 0x0f) // SELECT (a, n) extracts the nth 8-bit parameter from.
r1 = (mask1 and 0x80)? 0: SELECT (a, mask1 and 0x0f)
...
, , 4- . , , 4 , . , - .
,
!