Can anyone explain this floating point behavior?

Inspired by this question , I tried to find out exactly what was going on there (my answer was more intuitive, but I canโ€™t understand exactly why this is).

I believe that it comes down to this (64-bit Python works):

>>> sys.maxint 9223372036854775807 >>> float(sys.maxint) 9.2233720368547758e+18 

Python uses the IEEE 754 floating point representation, which actually has 53 bits for significant. However, as I understand it, the essentials in the above example will require 57 bits (56 if you drop the assumed leading 1) that will be presented. Can someone explain this discrepancy?

+4
source share
3 answers

Perhaps the following will help clarify the situation:

 >>> hex(int(float(sys.maxint))) '0x8000000000000000L' 

This shows that float(sys.maxint) is actually a power of 2. Therefore, in binary form, its mantissa is exactly 1 . IEEE 754 implies a leading 1. , so in the machine view, this mantissa number consists of all zero bits.

In fact, the IEEE bit diagram representing this number is as follows:

 0x43E0000000000000 

Note that only the first three pieces (sign and exponent) are nonzero. Significance consists entirely of zeros. Thus, 56 (and not 53) bits are not required for presentation.

+6
source

You're wrong. It requires 1 bit.

 >>> (9.2233720368547758e+18).hex() '0x1.0000000000000p+63' 
+2
source

When converting sys.maxint to float or double, the result is exactly 0x1p63, because the value contains only 24 or 53 bits (including the implicit bit), so the final bits cause rounding. (sys.maxint 2 ^ 63 - 1, and rounding is 2 ^ 63.)

Then, when you print this float, some routine formats it as a decimal digit. To do this, he calculates the numbers to represent 2 ^ 63. The fact that he is able to print 9.2233720368547758e + 18 does not mean that the original number contains bits that distinguish it from 9.2233720368547759e + 18. It just means that the bits in it represent itself 9.2233720368547758e + 18 (approximately). In fact, the next representable double-precision floating-point number is 9223372036854777856 (approximately 9.2233720368547778e + 18), which is 2 ^ 63 + 2048. Thus, the low 11 bits of these integers are not present in the double. The formatter simply displays the number as if these bits were zero.

+1
source

All Articles