, , PINSRD . , . (. x86 wiki , Agner Fog insn tables, microarch pdf )
, pshufb LUT 4- . 8- , , punpcklbw, 16- . ( LUT 4- ).
, GF16 . (, -.) , .
AVX2 128b pshufb , 256b. AVX512F: __m512i _mm512_permutex2var_epi32 (__m512i a, __m512i idx, __m512i b). (vpermi2b AVX512VBMI), (vpermi2w AVX512BW), dword (, vpermi2d AVX512F) qword (vpermi2q AVX512F). , . ( AMD XOP vpperm).
(vpermt2d/vpermi2d) . , .
:
*dst++ = src[*lut++];
lookup src, , lut. lut , src.
g_tables uint8_t . 0..63, . - , , . AVX2, vpmovzxbd. , , int64_t *, __m256i _mm256_cvtepu8_epi32 (__m128i a), __m128i. , IMO.
. , , . , SIMD 64 int16_t . , , if (sizeof...), .:( , , avx0... x86 , 4B, , . pack .
AVX512 sizeof(T) == sizeof(int8_t) sizeof(T) == sizeof(int16_t), src zmm.
g_tables LUT, AVX512 , vpermi2b. AVX512, 64 pshufb. (16B) pshufb : 0..15, 16..31 .. pcmpgtb - . . .
:
g_tables, . src, pshufb pshufd, . ( , pextrd pextrq, movq . movdqu).
, src shufps. , Nehalem (, , Core2). punpcklwd/dq/qdq ( punpckhwd ..) , shufps.
16B- , .
g_tables , , JIT- . , , .