Comparison with NaN using AVX

I am trying to create a fast decoder for BPSK using the built-in AVX features for Intel. I have a set of complex numbers that are represented as interleaved floats, but because of the BPSK modulation, only the real part (or even indexed floats) is required. Each float x displayed 0 when x < 0 and 1 if x >= 0 . This is done using the following procedure:

 static inline void normalize_bpsk_constellation_points(int32_t *out, const complex_t *in, size_t num) { static const __m256 _min_mask = _mm256_set1_ps(-1.0); static const __m256 _max_mask = _mm256_set1_ps(1.0); static const __m256 _mul_mask = _mm256_set1_ps(0.5); __m256 res; __m256i int_res; size_t i; gr_complex temp; float real; for(i = 0; i < num; i += COMPLEX_PER_AVX_REG){ res = _mm256_load_ps((float *)&in[i]); /* clamp them to avoid segmentation faults due to indexing */ res = _mm256_max_ps(_min_mask, _mm256_min_ps(_max_mask, res)); /* Scale accordingly for proper indexing -1->0, 1->1 */ res = _mm256_add_ps(res, _max_mask); res = _mm256_mul_ps(res, _mul_mask); /* And then round to the nearest integer */ res = _mm256_round_ps(res, _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC); int_res = _mm256_cvtps_epi32(res); _mm256_store_si256((__m256i *) &out[2*i], int_res); } } 

Firstly, I clamp all the resulting floats in the range [-1, 1] . Then, after some correct scaling, the result is rounded to the nearest integer. This will display all floats above 0.5 by 1 , and everything floats below 0.5 to 0 .

The procedure works fine if the input floats are normal numbers. However, due to some situations in the previous steps, it is likely that some input floats are NaN or -NaN . In this case, the β€œNaN” numbers are distributed through the functions _mm256_max_ps() , _mm256_min_ps() and all other AVX , which leads to an integer display of -2147483648 , which, of course, leads to the failure of my program due to incorrect indexing.

Is there a workaround to avoid this problem, or at least set NaN to 0 using AVX ?

+5
source share
2 answers

You can do it in an easy way to start, compare and mask: (not verified)

 res = _mm256_cmp_ps(res, _mm256_setzero_ps(), _CMP_NLT_US); ires = _mm256_srl_epi32(_mm256_castps_si256(res), 31); 

Or shift and xor: (also not verified)

 ires = _mm256_srl_epi32(_mm256_castps_si256(res), 31); ires = _mm256_xor_epi32(ires, _mm256_set1_epi32(1)); 

This version will also take care of the NaN sign (and ignore NaN-ness).

Alternative without AVX2 (not verified)

 res = _mm256_cmp_ps(res, _mm256_setzero_ps(), _CMP_NLT_US); res = _mm256_and_ps(res, _mm256_set1_ps(1.0f)); ires = _mm256_cvtps_epi32(res); 
+4
source

Harold posted a good solution for the question you really asked, but I want to make it clear that eliminating NaN values ​​when the clamp is completely simple. If any argument is NaN, MINPS and MAXPS simply return the second argument. So all you have to do is swap the order of the arguments, and NaN will also be clamped. For example, the following will pin NaNs to _min_mask:

 res = _mm256_max_ps(_mm256_min_ps(_max_mask, res), _min_mask); 
+3
source

All Articles