Are you curious if any value is greater than 4?
Yes, there are SIMD instructions for this. It is sad that auto-vectorization is not able to cope with this scenario. Here's a vectorized algorithm:
diff_v = end_v - start_v; // _mm_hsub_epi16 floor_v = max(4_v, diff_v); // _mm_max_epi16 if (floor_v != 4_v) return true; // wide scalar comparison
Use _mm_sub_epi16 with an array structure or _mm_hsub_epi16 with an array of structures.
In fact, since start stored first in memory, you will work with start_v - end_v , so use _mm_min_epi16 and the vector -4 .
Each SSE3 instruction will perform 8 comparisons at a time. It will still return faster sooner than the cycle. However, by expanding the loop a little more, you can buy extra speed (pass the first set of results to the packed min / max function to combine them).
So you will end with (approximately):
most_negative = threshold = _mm_set_epi64(0xFCFCFCFCFCFCFCFC); // vectorized -4 loop: a = load from range; b = load from range; diff = _mm_hsub_epi16(a, b); most_negative = _mm_min_epi16(most_negative, diff); // unroll by repeating the above four instructions 4 times or so if (most_negative != threshold) return true; repeat loop
source share