The compiler is far ahead of you. In your example:
float a = 1 / sqrtf(2 * M_PI);
float b = c / sqrtf(2 * M_PI);
nvopencc (Open64) produces the following:
mov.f32 %f2, 0f40206c99; // 2.50663
div.full.f32 %f3, %f1, %f2;
mov.f32 %f4, 0f3ecc422a; // 0.398942
which is equivalent
float b = c / 2.50663f;
float a = 0.398942f;
The second case compiles:
float a = 1 / sqrtf(c * 3.14159f);
float b = c / 1.77245f;
I assume that the expression for agenerated by the compiler should be more accurate than your "optmized" version, but at about the same speed.
source
share