Some questions up to a comma and numerical restrictions

Question

Some questions up to a comma and numerical restrictions

I know there are questions like this, but I could not find the answers. Please read before voting to close (:

According to PC ASM :

  The numeric coprocessor has eight floating point registers. 
 Each register holds 80 bits of data. 
 Floating point numbers are always stored as 80-bit 
 extended precision numbers in these registers.

How is this possible when sizeof shows different things. For example, in x64 architecture, the size of double is 8, and this is far from 80 bits.

why std::numeric_limits< long double >::max() gives me 1.18973e+4932 ?! This is a huuuuuuuuuuuge number. If this is not a way to get the maximum number of floating point numbers, then why does it compile at all and even more - why does it return a value.
what does it mean:

  Double precision magnitudes can range from approximately 10 ^ −308 to 10 ^ 308

These are huge numbers, you can’t store them in 8B or even 16B (which is enhanced accuracy and only 128 bits)?

Obviously I'm missing something. In fact, obviously, a lot of things.

+1

c ++ floating-point limit

Kiril Kirov May 12 '11 at 16:21

source share

4 answers

A double not a 64-bit floating point for Intel coprocessor, it is a 64-bit floating point IEEE 754. With sizeof (double) you get the size of the latter.
This is the right way to get the maximum value for long double , so your question is pointless.
You are probably missing floating point numbers rather than exact numbers. 10 ^ 308 does not store 308 digits, only about 19 digits.

+1

hirschhornsalz May 12 '11 at 4:34 p.m.

source share

The size of the space that the FPU uses and the amount of space used in memory to represent double are two different things. IEEE 754 (which most architectures probably use) specifies 32-bit single-point and 64-bit double-precision numbers, so sizeof(double) gives you 8 bytes. Intel x86 does floating point math using 80 bits.

std::numeric_limits< long double >::max() gives you the correct size for a long double , which is usually 80 bits. If you need a maximum size for a 64-bit double, you should use it as a template parameter.

As for questions about ranges, why, in your opinion, cannot you store them in 8 bytes? They really fit, and what you are missing is that there is a number that cannot be represented at the extremes of the range (for example, the exponent approaches 308, there are many integers that cannot be represented at all).

See also http://floating-point-gui.de/ for floating point information.

0

Mark B May 12, '11 at 16:25

source share

The floating point number on a computer is presented in accordance with IEEE 754-2008.

It defines several formats, among which
binary32 = Single precision,
binary64 = Double precision and
binary128 = Extreme precision is the most common.
http://en.wikipedia.org/wiki/IEEE_754-2008#Basic_formats

The double-precision number has 52 bits for the digit, which gives accuracy and 10 bits for the exponent, which gives the number size.
Thus, doubled 1.xxx (52 binary digits) * 2 ^ exponent (10 binary digits, so up to 2 ^ 10 = 1024)

And 2 ^ 1024 = 1.79 * 10 ^ 308
That is why this is the biggest value that you can save in double size.

When using the fourth precision number, they are 112 bits of precision and 14 digits for the metric, so the highest metric is 16384.

As 2 ^ 16384 gives 1.18 * 10 ^ 4932, you see that your C ++ test was perfectly correct and that on x64 your double is actually stored in four times the precision.

0

jmd May 12 '11 at 16:49

source share

Bill · Accepted Answer · 2011-05-12 16:28

1) sizeof is the size in memory, not in register. sizeof is in bytes, so 8 bytes = 64 bits. When in-memory calculations are doubled (according to this architecture), they get an additional 16 bits for more accurate intermediate calculations. When the value is copied back to memory, an additional 16 bits are lost.

2) Why do you think the long double does not increase to 1.18973e + 4932?

3) Why can't you store 10 ^ 308 in 8 bytes? I only need 13 bits: 4 for storage 10 and 9 for storage 308.

Some questions up to a comma and numerical restrictions

More articles: