I am working on an application that converts Float samples in the range -1.0 to 1.0 to 16 bits to ensure that the output of the optimized routines (SSE) is correct. I wrote a test suite, an optimized version against the SSE version and compares their output.
Before starting, I confirmed that the SSE rounding mode is set to the closest.
In my test example, the formula is:
ratio = 65536 / 2 output = round(input * ratio)
For the most part, the results are accurate, but on one particular input, I see a failure to enter -0.8499908447265625 .
-0.8499908447265625 * (65536 / 2) = -27852.5
Normal code correctly rounds this value to -27853 , but SSE code rounds it to -27852 .
The SSE code is used here:
void Float_S16(const float *in, int16_t *out, const unsigned int samples) { static float ratio = 65536.0f / 2.0f; static __m128 mul = _mm_set_ps1(ratio); for(unsigned int i = 0; i < samples; i += 4, in += 4, out += 4) { __m128 xin; __m128i con; xin = _mm_load_ps(in); xin = _mm_mul_ps(xin, mul); con = _mm_cvtps_epi32(xin); out[0] = _mm_extract_epi16(con, 0); out[1] = _mm_extract_epi16(con, 2); out[2] = _mm_extract_epi16(con, 4); out[3] = _mm_extract_epi16(con, 6); } }
Do-it-yourself example on request:
float ratio = 65536.0f / 2.0f; float in [4] = {-1.0, -0.8499908447265625, 0.0, 1.0}; int16_t out[4]; for(int i = 0; i < 4; ++i) out[i] = round(in[i] * ratio); static __m128 mul = _mm_set_ps1(ratio); __m128 xin; __m128i con; xin = _mm_load_ps(in); xin = _mm_mul_ps(xin, mul); con = _mm_cvtps_epi32(xin); int16_t outSSE[4]; outSSE[0] = _mm_extract_epi16(con, 0); outSSE[1] = _mm_extract_epi16(con, 2); outSSE[2] = _mm_extract_epi16(con, 4); outSSE[3] = _mm_extract_epi16(con, 6); printf("Standard = %d, SSE = %d\n", out[1], outSSE[1]);