I am trying to create a fast decoder for BPSK using the built-in AVX features for Intel. I have a set of complex numbers that are represented as interleaved floats, but because of the BPSK modulation, only the real part (or even indexed floats) is required. Each float x displayed 0 when x < 0 and 1 if x >= 0 . This is done using the following procedure:
static inline void normalize_bpsk_constellation_points(int32_t *out, const complex_t *in, size_t num) { static const __m256 _min_mask = _mm256_set1_ps(-1.0); static const __m256 _max_mask = _mm256_set1_ps(1.0); static const __m256 _mul_mask = _mm256_set1_ps(0.5); __m256 res; __m256i int_res; size_t i; gr_complex temp; float real; for(i = 0; i < num; i += COMPLEX_PER_AVX_REG){ res = _mm256_load_ps((float *)&in[i]); res = _mm256_max_ps(_min_mask, _mm256_min_ps(_max_mask, res)); res = _mm256_add_ps(res, _max_mask); res = _mm256_mul_ps(res, _mul_mask); res = _mm256_round_ps(res, _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC); int_res = _mm256_cvtps_epi32(res); _mm256_store_si256((__m256i *) &out[2*i], int_res); } }
Firstly, I clamp all the resulting floats in the range [-1, 1] . Then, after some correct scaling, the result is rounded to the nearest integer. This will display all floats above 0.5 by 1 , and everything floats below 0.5 to 0 .
The procedure works fine if the input floats are normal numbers. However, due to some situations in the previous steps, it is likely that some input floats are NaN or -NaN . In this case, the βNaNβ numbers are distributed through the functions _mm256_max_ps() , _mm256_min_ps() and all other AVX , which leads to an integer display of -2147483648 , which, of course, leads to the failure of my program due to incorrect indexing.
Is there a workaround to avoid this problem, or at least set NaN to 0 using AVX ?
source share