If your processor does not have XOP, than there is no efficient way to compare 64-bit unsigned integers .
I tore up the following from the Agner Fog Vector Class Library . This shows how to compare unsigned 64-bit integers.
static inline Vec2qb operator > (Vec2uq const & a, Vec2uq const & b) { #ifdef __XOP__
So, if your processor supports XOP, you should try compiling with -mxop and see if the loop is vectorized.
Edit: if GCC does not configure it the way you want and your processor has XOP, you can do
for (WorkerID=0; WorkerID<WorkersON-1; workerID+=2){ __m128i v = _mm_loadu_si128((__m128i*)&WorkerDataTime[workerID]); __m128i cmp = _mm_comgt_epu64(v, _mm_setzero_si128()); v = _mm_add_epi64(v,cmp); _mm_storeu_si128((__m128i*)&WorkerDataTime[workerID], v); } for (;WorkerID<WorkersON;++WorkerID){ if(WorkerDataTime[WorkerID] > 0) WorkerDataTime[WorkerID]-=1; }
Compile with -mxop and enable #include <x86intrin.h> .
Edit: as Nils Pipbenbrink pointed out, if you do not have XOP, you can do this with another command using _mm_xor_si128 :
for (WorkerID=0; WorkerID<WorkersON-1; WorkerID+=2){ __m128i v = _mm_loadu_si128((__m128i*)&WorkerDataTime[workerID]); __m128i mask = _mm_cmpeq_epi64(v,_mm_setzero_si128()); mask = _mm_xor_si128(mask, _mm_set1_epi32(~0)); v= _mm_add_epi64(v,mask); _mm_storeu_si128((__m128i*)&WorkerDataTime[workerID], v); } for (;WorkerID<WorkersON;++WorkerID){ if(WorkerDataTime[WorkerID] > 0) WorkerDataTime[WorkerID]-=1; }
Edit: Based on Stephen Canon's comment, I found out that there is a more efficient way to compare common 64-bit unsigned integers using the pcmpgtq from SSE4.2:
__m128i a,b; __m128i sign64 = _mm_set1_epi64x(0x8000000000000000L); __m128i aflip = _mm_xor_si128(a, sign64); __m128i bflip = _mm_xor_si128(b, sign64); __m128i cmp = _mm_cmpgt_epi64(aflip,bflip);