128-bit hash comparison with SSE


In my current project, I have to compare 128-bit values ​​(actually md5 hashes), and I thought that it would be possible to speed up the comparison using SSE instructions. My problem is that I cannot find good documentation on SSE instructions; I am looking for a 128-bit integer comparison instruction that will tell me if one hash is greater, less than or equal to another. Is there such an instruction?

PS: target computers are x86_64 servers with SSE2 instructions; I am also interested in the NEON instructions for the same job.

+6
c assembly inline-assembly sse neon
source share
3 answers

The SSE or NEON instruction sets do not have 128-bit integer comparison instructions.

SSE4.1 added a vector 64-bit integer comparison: PCMPEQQ and PCMPGTQ, but because of how they are implemented, it’s not easy to put two of them together into a 128-bit comparison.

The preferred way to perform a 128-bit comparison on x86_64 is to use a 64-bit comparison for a high word, and then an additional 64-bit comparison of a low word only if the high words are compared equal:

cmp {ahi}, {bhi} jne 0f cmp {alo}, {blo} 0: // flags are now set as though a comparison of unsigned 128-bit values // was performed; signed comparisons are a bit different. 

In ARM, a common idiom is a series of conditional word comparisons to set flags as needed.

+7
source share

In fact, a 128-bit comparison of the two values a and b possible using SSE 4.1 with two instructions and a backup register set to zero.

In x86 build, using the deprecated 128-bit SSE:

  pxor %xmm2, %xmm2 # set xmm2 to zero. Should be moved out of the loop. # compare %xmm0 to %xmm1 for equality pxor %xmm0, %xmm1 # xmm1 is zero if both operands are equal ptest %xmm2, %xmm1 # test not(xmm2) and xmm1. If any bit in xmm1 is set jc equal # the carry flag is cleared. not_equal: ... equal: 

Using the built-in functions in C is preferable since they automatically benefit from the syntax of the AVX 3 operands, which actually saves a significant number of SSE register moves.

 static const __m128i zero = {0}; inline bool compare128(__m128i a, __m128i b) { __m128i c = _mm_xor_si128(a, b); return _mm_testc_si128(zero, c); } 

This compiles with something similar, as mentioned above, especially the bool temporary addition, and the carry flag is used directly.

+6
source share

PCMPGT will not compare all 128 bits, it will always work with smaller units and produce separate results. In addition, it works with signed values, which complicates the situation.

If you work in 64-bit mode, I think that it would be the fastest to use two native 64-bit subtractions or comparisons.

Not sure why you cannot find the documentation, all of this is in the Intel installation instructions installation guide .

+2
source share

All Articles