How to perform an elemental left shift using __m128i?

The SSE shift commands that I found can only be shifted by the same amount by all elements:

  • _mm_sll_epi32()
  • _mm_slli_epi32()

They change all the elements, but for the same amount of shift.

Is there a way to apply different shifts to different elements? Something like that:

 __m128i a, __m128i b; r0:= a0 << b0; r1:= a1 << b1; r2:= a2 << b2; r3:= a3 << b3; 
+4
c sse avx
source share
3 answers

There is _mm_shl_epi32() intrinsic that does just that.

http://msdn.microsoft.com/en-us/library/gg445138.aspx

However, this requires a set of XOP instructions . Only this team has AMD Bulldozer and Interlagos processors or later. It is not available on any Intel processor.

If you want to do this without XOP instructions, you will need to do it in a complicated way: pull them out and do them one by one.

Without XOP instructions, you can do this with SSE4.1 using the following features:

  • _mm_insert_epi32()
  • _mm_extract_epi32()

http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse41_reg_ins_ext.htm

This will allow you to extract portions of the 128-bit register into regular registers in order to shift and return them.

If you go to the last method, it will be terribly inefficient. This is why _mm_shl_epi32() exists in the first place.

+7
source share

Without XOP, your options are limited. If you can control the format of the shift counter argument, you can use _mm_mullo_pi16 , since multiplying by a force of two is the same as switching to that power.

For example, if you want to shift 8 16-bit elements in the SSE register by <0, 1, 2, 3, 4, 5, 6, 7> , you can multiply by 2 raised to the values โ€‹โ€‹of the shift counter, i.e. on <0, 2, 4, 8, 16, 32, 64, 128> .

+2
source share

in some cases, this can replace _mm_shl_epi32(a, b) :

 _mm_mullo_ps(a, 1 << b); 

generally speaking, this requires b have a constant value - I don't know about an efficient way to calculate (1 << b) using older SSE instructions.

0
source share

All Articles