Checking equality between two __m128i variables

If I want to run a bit test for equality between two __m128i variables, do I need to use the SSE instruction or can I use == ? If not, which SSE instruction should I use?

+8
c x86 sse simd
source share
3 answers

Although using _mm_movemask_epi8 is one solution if you have a processor with SSE4.1, I think the best solution is to use an instruction that sets the zero or carry flag to the FLAGS register. Saves a test or cmp statement .

For this you can do this:

 if(_mm_test_all_ones(_mm_cmpeq_epi8(v1,v2))) { //v0 == v1 } 

Edit: as Paul R. _mm_test_all_ones pointed out, two commands are generated: pcmpeqd and ptest . With _mm_cmpeq_epi8 , that is three instructions in total. Here's the best solution that uses just two teams:

 __m128i neq = _mm_xor_si128(v1,v2); if(_mm_test_all_zeros(neq,neq)) { //v0 == v1 } 

It generates

 pxor %xmm1, %xmm0 ptest %xmm0, %xmm0 
+9
source share

You can use comparison and then extract the mask from the comparison result:

 __m128i vcmp = _mm_cmpeq_epi8(v0, v1); // PCMPEQB uint16_t vmask = _mm_movemask_epi8(vcmp); // PMOVMSKB if (vmask == 0xffff) { // v0 == v1 } 

This works with SSE2 and later.

As @Zboson noted, if you have SSE 4.1, you can do it like this, which can be a little more efficient since these are two SSE instructions and then a flag test (ZF):

 __m128i vcmp = _mm_xor_si128(v0, v1); // PXOR if (_mm_testz_si128(vcmp, vcmp)) // PTEST (requires SSE 4.1) { // v0 == v1 } 

FWIW I just compared both of these implementations on the Haswell Core i7, using clang to compile the test bundle, and the synchronization results were very similar: the SSE4 implementation looks very slightly faster, but it's hard to measure the difference.

+10
source share

Consider using the SSE4.1 ptest :

 if(_mm_testc_si128(v0, v1)) {if equal} else {if not} 

ptest computes the bitwise AND of 128 bits (representing integer data) in the mask and returns 1 if the result is zero, otherwise returns 0.

+1
source share

All Articles