Skip Zero Check in IEEE 754

I read this in a popular book on computer graphics,

There are many numerical calculations that become much easier if the programmer uses the IEEE rules. For example, consider the expression:

a = 1 / (1 / b + 1 / c)

Such expressions arise with resistors and lenses. If division by zero caused the program to crash (as was the case in many systems before the IEEE floating point), then two if statements will be required to check for small or zero values ​​of b or c. Instead, an IEEE floating point, if b or c is zero, we get a zero value for a as desired.

But what about the case where b=+0 and c=-0 ? Then a=1/inf-inf=nan . Is the book wrong about this optimization, or am I misunderstanding something? It seems that we still need at least one check for signs b and c, and not for the absence of checks.

Edit One of the suggestions in the comments was to do a post-check for NaN. Is this an idiomatic way to “take advantage” of this type of number?

 bool is_nan(float x) { return x != x; } float optic_thing(float b, float c) { float result = 1.0f / (1.0f/b + 1.0f/c); if (is_nan(result)) result = 0; return result; } 
+5
source share
1 answer

But what about the case where b=+0 and c=-0 ?

Convert -0.0 to +0.0 without branching by adding 0.0 .

 int main(void) { double b = +0.0; double c = -0.0; printf("%e\n", b); printf("%e\n", c); printf("%e\n", 1.0/(1.0/b + 1.0/c)); b += 0.0; c += 0.0; printf("%e\n", 1.0/(1.0/b + 1.0/c)); return 0; } 

Output

 0.000000e+00 -0.000000e+00 nan 0.000000e+00 

[Edit] On the other hand, any values ​​around 0.0 , but not 0.0 , are most likely numerical artifacts, not exact resistance values. The above still has problems with tiny value pairs, such as b = DBL_TRUE_MIN; c = -b; b = DBL_TRUE_MIN; c = -b; Problem: 1.0/tinyInfinity and +Infinity + -InfinityNAN . You can use either a wider floating point to calculate the quotient, or narrow operands.

  b += 0.0; c += 0.0; printf("%Le\n", 1.0/(1.0L/b + 1.0L/c)); // Lose some precision, but we are talking about real resistors/lenses here. b = (float) b + 0.0f; c = (float) c + 0.0f; printf("%e\n", 1.0/(1.0/b + 1.0/c)); 
+2
source

All Articles