Without XOP, your options are limited. If you can control the format of the shift counter argument, you can use _mm_mullo_pi16 , since multiplying by a force of two is the same as switching to that power.
For example, if you want to shift 8 16-bit elements in the SSE register by <0, 1, 2, 3, 4, 5, 6, 7> , you can multiply by 2 raised to the values โโof the shift counter, i.e. on <0, 2, 4, 8, 16, 32, 64, 128> .
mattst88
source share