128-bit SSE counter?

I need a function of the __m128i variable, which has a period of 2 ^ 128. This does not need to be monotonously increased (for example, a counter), but visit each value once.

The simplest example that I could think of is actually a 128-bit counter, but I found it difficult to implement in SSE. Are there simpler / faster solutions?

+7
source share
2 answers

Here is a monotonous counter. I'm not sure if you can call it simple, though.

Assuming both ONE and ZERO always in the register, then this should compile 5 instructions. (7 or 8 if VEX encoding is not used)

 inline __m128i nextc(__m128i x){ const __m128i ONE = _mm_setr_epi32(1,0,0,0); const __m128i ZERO = _mm_setzero_si128(); x = _mm_add_epi64(x,ONE); __m128i t = _mm_cmpeq_epi64(x,ZERO); t = _mm_and_si128(t,ONE); t = _mm_unpacklo_epi64(ZERO,t); x = _mm_add_epi64(x,t); return x; } 

Test Code (MSVC):

 int main() { __m128i x = _mm_setr_epi32(0xfffffffa,0xffffffff,1,0); int c = 0; while (c++ < 10){ cout << x.m128i_u64[0] << " " << x.m128i_u64[1] << endl; x = nextc(x); } return 0; } 

Output:

 18446744073709551610 1 18446744073709551611 1 18446744073709551612 1 18446744073709551613 1 18446744073709551614 1 18446744073709551615 1 0 2 1 2 2 2 3 2 

A slightly better version suggested by @Norbert P. It saves 1 instruction according to my original decision.

 inline __m128i nextc(__m128i x){ const __m128i ONE = _mm_setr_epi32(1,0,0,0); const __m128i ZERO = _mm_setzero_si128(); x = _mm_add_epi64(x,ONE); __m128i t = _mm_cmpeq_epi64(x,ZERO); t = _mm_unpacklo_epi64(ZERO,t); x = _mm_sub_epi64(x,t); return x; } 
+5
source

Never forget the KISS principle.

Inserting this (unsigned integers are required to bypass the C standard, so visiting each value only once):

 __uint128_t inc(__uint128_t x) { return x+1; } 

in this gives (for x64):

  addq $1, %rdi adcq $0, %rsi movq %rdi, %rax movq %rsi, %rdx ret 

easy / fast enough? If you embed this, you can probably get away with just addq / adcq ( movq and ret required for x64 ABI: if you embed a function, they are not required)


To refer to Voo's comment on sucking MSVC, you can use the following:

 inline void inc(unsigned long long *x, unsigned long long *y) { if (!++*x) ++*y; // yay for obfuscation! } 

I don't have an MSVC installation nearby, so I can't test it, but it should give something similar to what I posted above. Then, if you really need __m128i, you can pour two halves.

+4
source

All Articles