Printing an integral part of a floating point number

I am trying to figure out how to print floating point numbers without using library functions. Printing the decimal part of a floating point number was pretty simple. Printing the component is harder:

static const int base = 2; static const char hex[] = "0123456789abcdef"; void print_integral_part(float value) { assert(value >= 0); char a[129]; // worst case is 128 digits for base 2 plus NUL char * p = a + 128; *p = 0; do { int digit = fmod(value, base); value /= base; assert(p > a); *--p = hex[digit]; } while (value >= 1); printf("%s", p); } 

Printing an integral part of FLT_MAX works flawlessly with base 2 and base 16:

 11111111111111111111111100000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000 (base 2) ffffff00000000000000000000000000 (base 16) 

However, printing in base 10 leads to errors after the first 7 digits:

 340282368002860660002286082464244022240 (my own function) 340282346638528859811704183484516925440 (printf) 

I assume this is the result of dividing by 10. It gets better if I use double instead of float:

 340282346638528986604286022844204804240 (my own function) 340282346638528859811704183484516925440 (printf) 

(If you do not believe printf , enter 2^128-2^104 in Wolfram Alpha. That's correct.)

Now, how printf manage to print the correct result? Does it use any bigint objects inside? Or is there some kind of floating point trick I'm missing?

+4
source share
6 answers

It seems like the workhorse for converting floating point to string is the dtoa() function. See dtoa.c in newlib for how they do it.

Now, how did printf manage to print the correct result?

I think this is close to magic. At least the source is like a dark spell.

Does it use some bigint objects inside?

Yes, find _Bigint in the linked source file.

Or is there some kind of floating point trick I'm missing?

Probably.

+2
source

I believe that the problem lies in the value of / = base; Remember that 10 is not a finite fraction in a binary system, and therefore this calculation is never correct. I also assume that some error will occur in fmod for the same reason.

printf will first compute the integral part and then convert it to decimal (if I get a way to print the integral part correctly).

+3
source

/ Edit: Read the Unni answer first . These results come from http://codepad.org/TLqQzLO3 .

 void print_integral_part(float value) { printf("input : %f\n", value); char a[129]; // worst case is 128 digits for base 2 plus NUL char * p = a + 128; *p = 0; do { int digit = fmod(value, base); value /= base; printf("interm: %f\n", value); *--p = hex[digit]; } while (value >= 1); printf("result: %s\n", p); } print_integral_part(3.40282347e+38F); 

to see how your value /= base operation gets corrupted by your value:

 input : 340282346638528859811704183484516925440.000000 interm: 34028234663852885981170418348451692544.000000 interm: 3402823466385288480057879763104038912.000000 interm: 340282359315034876851393457419190272.000000 interm: 34028234346940236846450271659753472.000000 interm: 3402823335658820218996583884128256.000000 interm: 340282327376181848531187106054144.000000 interm: 34028232737618183051678859657216.000000 interm: 3402823225404785588136713388032.000000 interm: 340282334629736780292710989824.000000 interm: 34028231951816403862828351488.000000 interm: 3402823242405304929106264064.000000 interm: 340282336046446683592065024.000000 interm: 34028232866774907300610048.000000 interm: 3402823378911210969759744.000000 interm: 340282332126513595416576.000000 interm: 34028233212651357863936.000000 interm: 3402823276229139890176.000000 interm: 340282333252413489152.000000 interm: 34028234732616232960.000000 interm: 3402823561222553600.000000 interm: 340282356122255360.000000 interm: 34028235612225536.000000 interm: 3402823561222553.500000 interm: 340282366859673.625000 interm: 34028237357056.000000 interm: 3402823735705.600098 interm: 340282363084.799988 interm: 34028237619.200001 interm: 3402823680.000000 interm: 340282368.000000 interm: 34028236.800000 interm: 3402823.600000 interm: 340282.350000 interm: 34028.234375 interm: 3402.823438 interm: 340.282349 interm: 34.028235 interm: 3.402824 interm: 0.340282 result: 340282368002860660002286082464244022240 

If in doubt, throw more printfs at it;)

+3
source

According to the IEEE implementation, with only one precision, only 24 bits of data is stored at any time in the float variable. This means that only 7 decimal digits are stored in a floating number.

The rest of the number is stored exponentially. FLT_MAX is initialized as 3.402823466e + 38F. So, after the 10th accuracy, which digit should be printed, it is not determined anywhere.

From the Visual C ++ 2010 compiler, I get this output 3402823466385288600000000000000000000000000000000, which is the only output of vaild.

So, initially we have many valid numbers 3402823466 So, after the 1st division we have only 0402823466 Thus, the system needs to get rid of the left 0 and enter a new digit on the right. In an ideal integer division, it is 0. Since you are doing a floating division (value / = base;), the system gets another digit to fill in this location.

So, in my opinion, printf can assign the above significant digits to an integer and work with it.

+2
source

Let us explain it again. After the integer part has been printed (accurately) without any rounding except the cut in the 0 direction, this is the time for decimal bits.

Start with a byte string (e.g. 100 for starters) containing binary zeros. If the first bit to the right of the decimal point in the fp value is set, it means that 0.5 (2 ^ -1 or 1 / (2 ^ 1) is a fraction component. Therefore, add 5 to the first byte. The next bit is set to 0.25 (2 ^ - 2 or 1 / (2 ^ 2)) is part of the fraction add 5 to the second byte and add 2 to the first (oh, don't forget about the transfer, they happen - lower school math). The next bit sets the value to 0.125, so add 5 to the third byte, 2 to the second and 1 to the first. And so on:

  value string of binary 0s start 0 0000000000000000000 ... bit 1 0.5 5000000000000000000 ... bit 2 0.25 7500000000000000000 ... bit 3 0.125 8750000000000000000 ... bit 4 0.0625 9375000000000000000 ... bit 5 0.03125 9687500000000000000 ... bit 6 0.015625 9843750000000000000 ... bit 7 0.0078125 9921875000000000000 ... bit 8 0.00390625 9960937500000000000 ... bit 9 0.001953125 9980468750000000000 ... ... 

I did it manually so that maybe I missed something, but implementing this in code is trivial.

Thus, for all those SO who cannot get the exact result using float, people who don't know what they are talking about are proof that the values ​​of the floating-point fractions are perfectly accurate. Painfully accurate. But binary.

For those who take the time to understand how this works, the best accuracy is within reach. As for the others ... well, I think they will continue to not look at the backgrounds to answer the question that has been answered repeatedly earlier, honestly believing that they have found a “broken floating point” (or whatever it is called) and Every day publishes a new version of the same issue.

"Close to magic", "dark spell" - it's fun!

+1
source

Like Agent_L's answer, you suffer from a false result caused by dividing the value by 10. Float, like any binary floating-point type, cannot correctly express the most rational number in the decimal system. After splitting, in most cases the result cannot be inserted into the binary, so it will be rounded. Therefore, the more you divide, the more mistakes you recognize.

If the number is not very large, a quick solution will multiply it by 10 or power 10 depending on how many digits after the decimal point you need.

Another method has been described here.

0
source

Source: https://habr.com/ru/post/1416334/


All Articles