Implicit conversion from long long to float gives unexpected result

Question

Implicit conversion from long long to float gives unexpected result

In an attempt to verify (using VS2012) a book application (second sentence) that

When we assign an integral value to an object of floating-point type, the fractional part is zero. Precision may be lost if the integer has more bits than the floating-point object can accommodate.

I wrote the following prog:

 #include <iostream> #include <iomanip> using std::cout; using std::setprecision; int main() { long long i = 4611686018427387905; // 2^62 + 2^0 float f = i; std::streamsize prec = cout.precision(); cout << i << " " << setprecision(20) << f << setprecision(prec) << std::endl; return 0; }

Output signal

 4611686018427387905 4611686018427387900

I was expecting form output

 4611686018427387905 4611690000000000000

How is a 4-byte float able to store so much information about an 8-byte integer? Is there a value for i that actually demonstrates the claim?

+7

c ++

Cohomologous Jan 4 '17 at 2:32

source share

2 answers

Ianpudney · Answer 1 · 2017-01-04T02:41:24+0000

Floats do not store their data in database 10, they store it in database 2. Thus, 4611690000000000000 is actually not a very round number. This is a binary representation:

 100000000000000000000111001111100001000001110001010000000000000.

As you can see, writing takes a lot of data. However, the number that is actually printed has the following binary representation:

 11111111111111111111111111111111111111111111111111111111111100

As you can see, the multidimensional number and the fact that it is disconnected by 4 from the power of two is most likely due to rounding in the convert-to-base-10 algorithm.

As an example of a number that does not fit in the float, try the number you expected:

 4611690000000000000

You will notice that it will come out very differently.

Mathsquared · Answer 2 · 2017-01-04T02:42:06+0000

A float stores so much information because you are working with a number that is so close to a power of 2.

The float format stores numbers in the main binary scientific notation. In your case, it is stored as something like

1.0000000 ... [61 zeros] ... 00000001 * 2 ^ 62.

The float format cannot store 62 decimal places, so the final 1 is disabled ... but we have 2 ^ 62 left, which almost exactly matches the number you are trying to save.

I am not good at production examples, but there is no CERT; you can view an example of what happens with converted numerical conversions here . Note that the example is in Java, but C ++ uses the same floating-point types; In addition, the first example is a conversion between a 4-byte int and a 4-byte float , but this proves your point again (there is less integer information that needs to be stored than in your example, but it still fails).

Implicit conversion from long long to float gives unexpected result

More articles: