Equivalent SSE unpacklo_ps / unpackhi_ps to AVX (for paired)

In SSE, if I have a 128-bit register containing 4 floats, i.e.

A = abcd ('a','b','c','d' are floats and 'A' is a 128-bit SSE register) 

and

 B = efgh 

then if i want

 C = aebf 

I can just do:

 C = _mm_unpacklo_ps(A,B); 

Similarly if i want

 D = cgdh 

I can do:

 D = _mm_unpackhi_ps(A,B); 

If I have an AVX register containing doubles, is it possible to do the same with a single instruction?

Depending on how these internal functions work, I know that I cannot use _mm256_unpacklo_pd() , _mm256_shuffle_pd() , _mm256_permute2f128_pd() or _mm256_blend_pd() . Is there any instruction other than the ones I can use, or do I need to use a combination of the above instructions?

+6
source share
1 answer

One way that I can think of is as follows:

 A1 = _mm256_unpacklo_pd(A,B); A2 = _mm256_unpackhi_pd(A,B); C = _mm256_permute2f128_pd(A1,A2,0x20); D = _mm256_permute2f128_pd(A1,A2,0x31); 

If anyone has a better solution, please write below.

+4
source

All Articles