Is floating point == ever OK?

Question

Is floating point == ever OK?

Just today, I came across a third-party software that we use, and there was something like that in their code example:

// defined in somewhere.h static const double BAR = 3.14; // code elsewhere.cpp void foo(double d) { if (d == BAR) ... }

I am aware of the problem with floating points and their representation, but I wondered if there were any cases where float == float would be okay? I do not ask when it might work, but when it makes sense and works.

Also, what about a call like foo(BAR) ? Would it always compare equal since they both use the same static const BAR ?

+50

c ++ comparison floating-point

murrekatt Jan 13 '11 at 17:05

source share

14 answers

Yes, you are guaranteed that integers, including 0.0, are compared with ==

Of course, you must be careful how you received the whole number in the first place, the appointment is safe, but the result of any calculation is suspicious.

ps there are many real numbers that have perfect reproduction in the form of a float (I think 1/2, 1/4 1/8, etc.), but you probably do not know in advance that you have one of these.

Just to clarify. IEEE 754 ensures that floating-point representations of integers (integers) within a range are accurate.

 float a=1.0; float b=1.0; a==b // true

But you have to be careful how you get integers

 float a=1.0/3.0; a*3.0 == 1.0 // not true !!

+33

Martin Beckett Jan 13 2018-11-17T00:

source share

Other answers explain why using == for fp numbers is dangerous. I just found one example that illustrates these dangers well, I suppose.

On the x86 platform, you may get strange fp results for some calculations that are not related to the rounding problems inherent in the performed calculations. This simple C program sometimes prints an “error”:

 #include <stdio.h> void test(double x, double y) { const double y2 = x + 1.0; if (y != y2) printf("error\n"); } void main() { const double x = .012; const double y = x + 1.0; test(x, y); }

The program essentially just calculates

 x = 0.012 + 1.0; y = 0.012 + 1.0;

(applies only to two functions and with intermediate variables), but comparison can still give false!

The reason is that on the x86 platform, programs typically use the x87 FPU for FP calculations. X87 internally computes with greater precision than a regular double , so double values should be rounded when they are stored in memory. This means that the circuit x87 → RAM → x87 loses accuracy, and therefore, the calculation results differ depending on whether intermediate results passed through RAM or all of them remained in the FPU registers. This, of course, is a solution for the compiler, therefore the error is displayed only for certain compilers and optimization settings: - (.

See the GCC bug for more details: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

Rather scary ...

Note:

Errors of this type are usually quite difficult to debug, since different values become the same when they fall into RAM.

So, if you, for example, extend the above program to actually print the bit patterns y and y2 immediately after comparing them, you will get the same value. To print a value, it must be loaded into RAM so that it can be transferred to some printing function, for example printf , and this will lead to the disappearance of the difference ...

+12

sleske Feb 10 '11 at 11:12

source share

I will try to provide a more or less realistic example of legitimate, meaningful and useful testing for even distribution.

 #include <stdio.h> #include <math.h> /* let try to numerically solve a simple equation F(x)=0 */ double F(double x) { return 2*cos(x) - pow(1.2, x); } /* I'll use a well-known, simple&slow but extremely smart method to do this */ double bisection(double range_start, double range_end) { double a = range_start; double d = range_end - range_start; int counter = 0; while(a != a+d) // <-- WHOA!! { d /= 2.0; if(F(a)*F(a+d) > 0) /* test for same sign */ a = a+d; ++counter; } printf("%d iterations done\n", counter); return a; } int main() { /* we must be sure that the root can be found in [0.0, 2.0] */ printf("F(0.0)=%.17f, F(2.0)=%.17f\n", F(0.0), F(2.0)); double x = bisection(0.0, 2.0); printf("the root is near %.17f, F(%.17f)=%.17f\n", x, x, F(x)); }

I would prefer not to explain the bisection method, but emphasize the stopping condition. It has exactly the form under discussion: (a == a+d) , where both sides are floats: a is our current approximation of the root of the equation, and d is our current accuracy. Given the precondition of the algorithm - what should be the root between range_start and range_end - we guarantee at each iteration that the root remains between a and a+d , and d cut by half at each step, reducing the boundaries.

And then, after several iterations, d becomes so small that when adding with a it rounds to zero! That is, a+d is closer to a , then to any other float ; and therefore FPU rounds it to the nearest value: to a itself. This can easily be illustrated by calculation on a hypothetical computer; let him have a 4-digit decimal mantissa and some large range of exhibitors. Then what result should the machine give 2.131e+02 + 7.000e-3 ? The exact answer is 213.107 , but our machine cannot represent such a number; he must round it. And 213.107 much closer to 213.1 than to 213.2 - therefore, the rounded result becomes 2.131e+02 - the small term disappeared, rounded to zero. It is likewise guaranteed what happens at some iteration of our algorithm - and at that moment we can no longer continue. We found the root of the greatest possible accuracy.

The encouraging conclusion, apparently, is that the floats are complex. They look just like real numbers, so every programmer is tempted to think of them as real numbers. But this is not so. They have their own behavior, a bit reminiscent of reality, but not quite the same. You must be very careful with them, especially when comparing for equality.

Update

Repeating the answer after a while, I also noticed an interesting fact: in the above algorithm, you can not actually use a "small amount" in a stopped state. For any choice of number, data will be entered that will consider your choice too large , which will lead to a loss of accuracy, and data will be entered that will consider your choice too small , causing excessive iterations or even entering an infinite loop. The following is a detailed discussion.

You may already know that there is no “small number” in calculus: for any real number, you can easily find infinitely many even smaller ones. The problem is that one of those “even smaller” ones may be what we are really looking for; this may be the root of our equation. Worse, for different equations there can be different roots (for example, 2.51e-8 and 1.38e-8 ), both of which will approach the same number if our stopping condition looks like d < 1e-6 . Whatever “small number” you choose, many roots that would be correctly found with maximum accuracy with the stop condition a == a+d will be corrupted if the “epsilon” is too large .

True, however, the exponent has a limited range in floating point numbers, so you can find the smallest nonzero positive FP number (for example, 1e-45 denorm for IEEE 754 with one FP precision). But it is useless! while (d < 1e-45) {...} will loop forever, assuming a one-point (positive non-zero) d .

Leaving aside these cases of the pathological margin, any choice of a "small number" in the stopped state d < eps will be too small for the "equation". In those equations where the root has a high enough index, the result of subtracting two mantissas that differ only by the least significant digit will easily exceed our epsilon. For example, with a 6-digit mantissa 7.00023e+8 - 7.00022e+8 = 0.00001e+8 = 1.00000e+3 = 1000 , which means that the smallest possible difference between numbers with an index of +8 and a 5-digit mantissa is .. . 1000! This will never fit, say, in 1e-4 . For these numbers with a (relatively) high exponent, we simply do not have enough accuracy to see the difference 1e-4 .

My implementation above also took this last issue into account, and you can see that d cut in two steps each time, instead of being recounted as the difference (possibly exponentially huge) a and b . Therefore, if we change the stop condition to d < eps , the algorithm will not get stuck in an infinite loop with huge roots (it could very well with (ba) < eps ), but it will still perform unnecessary iterations when compressing d lower accuracy a .

Such reasoning may seem overly theoretical and unnecessarily deep, but his goal is to again illustrate the trick of the floats. One must be very careful about their ultimate accuracy when writing arithmetic operators around them.

+7

ulidtko Feb 09 2018-11-11T00:

source share

Ideal for integral values even in floating point formats

But the short answer is: "No, do not use ==."

Oddly enough, the floating-point format works "perfectly", that is, with accurate accuracy when working with integral values within the format range. This means that if you stick to double values, you will get great integers with just over 50 bits, which will give you around + 4,500,000,000,000,000 or 4.5 quadrillion.

In fact, this is exactly how JavaScript works inside, and why JavaScript can do things like + and - on really big numbers, but can only << and >> on 32-bit ones.

Strictly speaking, you can accurately compare sums and products of numbers with exact representations. These would be integers, plus fractions consisting of 1/2 ⁿ terms. Thus, a cycle increasing by n + 0.25, n + 0.50 or n + 0.75 will be a fine, but not any of the other 96 decimal fractions with 2 digits.

Thus, the answer is this: although an exact definition can theoretically make sense in narrow cases, it is better to avoid it.

+6

DigitalRoss Jan 13 2018-11-17T00:

source share

The only time I've ever used == (or != ) For a float is the following:

 if (x != x) { // Here x is guaranteed to be Not a Number }

and I have to admit that I'm guilty of using Not A Number as a magic floating point constant (using numeric_limits<double>::quiet_NaN() in C ++).

It makes no sense to compare floating point numbers for strict equality. Floating-point numbers were designed with predictable limits of relative accuracy. You are responsible for knowing what accuracy to expect from them and your algorithms.

+5

Alexandre C. Jan 13 '11 at 17:12

source share

Perhaps this is normal if you are never going to calculate a value before comparing it. If you check if the floating point number is exactly pi or -1 or 1, and you know that the bounded values are passed in ...

+4

Abdullah Jibaly Jan 13 '11 at 17:09

source share

I also used it several times when I was rewriting several algorithms for multi-threaded versions. I used a test comparing the results for the single and multi-threaded versions to make sure that both of them give exactly the same result.

+2

Krzysztof Hasiński Jan 17 2018-11-11T00: 00Z

source share

Yes. 1/x will be valid if x==0 . Here you do not need an inaccurate test. 1/0.00000001 excellent. I can't think of any other case - you can't even check tan(x) for x==PI/2

+1

MSalters Jan 14 2018-11-11T00:

source share

Let's say you have a function that scales an array of floats by a constant coefficient:

 void scale(float factor, float *vector, int extent) { int i; for (i = 0; i < extent; ++i) { vector[i] *= factor; } }

I assume that your floating point implementation can accurately represent 1.0 and 0.0, and 0.0 is represented by all 0 bits.

If factor exactly 1.0, then this function does not work, and you can return without any work. If factor is exactly 0.0, then this can be implemented with a memset call, which is likely to be faster than an individual floating point execution.

The netlib reference implementation of BLAS functions makes extensive use of such methods.

+1

Philip Starhill Apr 29 2018-11-21T00:

source share

Other posts show where appropriate. I think using bit-accurate comparisons to avoid unnecessary calculations is also good.

Example:

 float someFunction (float argument) { // I really want bit-exact comparison here! if (argument != lastargument) { lastargument = argument; cachedValue = very_expensive_calculation (argument); } return cachedValue; }

+1

Nils Pipenbrinck Jul 17 2018-11-11T00:

source share

I know this is an old thread, but I would say that comparing a float for equality would be ok if a false-negative answer is allowed.

Suppose, for example, that you have a program that displays floating point values and that if the floating point value is exactly equal to M_PI , then you want it to print "pi" instead, if the value deviates from the small bit from an exact double representation of the M_PI , it will print a double value instead, which is equally valid but slightly less readable to the user.

0

Joel Feb 17 2018-12-17T00:

source share

In my opinion, comparison with respect to equality (or some equivalence) is a requirement in most situations: standard C ++ containers or algorithms with an implied equality comparison functor, for example std :: unordered_set, Requires this comparator to be an equivalence relation See UnorderedAssociativeContainer . Unfortunately, comparison with epsilon as in abs(a - b) < epsilon does not give an equivalence relation, since it loses transitivity. This is most likely undefined behavior, in particular two “almost equal” floating point numbers can give different hashes; this may cause unordered_set to be invalid. Personally, I would use most of the time for floating point numbers if any operands didn't accept any fpu calculations. With containers and container algorithms where only read / write is involved, == (or any equivalence relation) is the safest.

abs(a - b) < epsilon is more or less a convergence criterion similar to the limit. I find this relationship useful if I need to verify that a mathematical correspondence is performed between two calculations (e.g. PV = nRT, or distance = time * speed).

In short, use == if and only if floating point calculation does not occur; never use abs(ab)<e as an equality predicate;

0

Julien Villemure-Fréchette Jun 30 '17 at 20:42 on

source share

I have a drawing program that basically uses a floating point for its coordinate system, since the user is allowed to work with any degree of detail / scaling. What they draw contains lines that can be bent at the points they create. When they drag one point over another, they are combined.

To make a “correct” floating point comparison, I would have to come up with some range in which points could be considered the same. Since the user can zoom to infinity and work in this range, and since I could not get anyone to make any range, we simply use '==' to see if the points coincide. Sometimes a problem arises when points that must be exactly the same are disabled on .000000000001 or something (especially around 0.0), but this usually works fine. It should be difficult to merge the dots without snapping enabled ... or at least not how the original version worked.

It periodically drops the test group, but their problem is: p

One way or another, there is an example of a possible reasonable time to use '=='. It should be noted that the decision is less about technical accuracy than about the wishes of the client (or lack thereof) and convenience. In any case, this is not what you need. So, what if two points do not merge when you expect them? This is not the end of the world and will not affect the "calculations".

-3

Crazy Eddie Jan 13 '11 at 18:14

source share

Cameron Skinner · Accepted Answer · 2011-01-13 17:10

There are two ways to answer this question:

Are there cases where float == float gives the correct result?
Are there cases where float == float is acceptable encoding?

Reply to (1): Yes, sometimes. But it will be fragile, which leads to the answer to (2): No. Do not do this. You ask for bizarre mistakes in the future.

As for calling the form foo(BAR) : In this particular case, the comparison will return true, but when you write foo , you do not know (and should not depend) on how it is called. For example, calling foo(BAR) would be fine, but foo(BAR * 2.0 / 2.0) (or even maybe foo(BAR * 1.0) , depending on how much the compiler optimizes things) would break. You should not rely on the caller not performing arithmetic!

In short, even if a == b will work in some cases, you really should not rely on it. Even if you can guarantee the semantics of calls today, you may not be able to guarantee them next week to save yourself the pain and not use == .

In my opinion, float == float will never be * OK, because it almost does not lend itself.

* At low values never.

Is floating point == ever OK?

Update

Ideal for integral values ​​even in floating point formats

More articles:

Ideal for integral values even in floating point formats