Exponent Size and Percentage in float256

You better look at the table to understand what I want:

╔════════╦════════╦════════════╦════════════╗ β•‘ name β•‘ sign β•‘ exponent β•‘ fraction β•‘ ╠════════╬════════╬════════════╬════════════╣ β•‘float16 β•‘ 1 β•‘ 5 β•‘ 10 β•‘ ╠════════╬════════╬════════════╬════════════╣ β•‘float32 β•‘ 1 β•‘ 8 β•‘ 23 β•‘ ╠════════╬════════╬════════════╬════════════╣ β•‘float64 β•‘ 1 β•‘ 11 β•‘ 52 β•‘ ╠════════╬════════╬════════════╬════════════╣ β•‘float128β•‘ 1 β•‘ 15 β•‘ 112 β•‘ ╠════════╬════════╬════════════╬════════════╣ β•‘float256β•‘ 1 β•‘ ???? β•‘ ???? β•‘ ╠════════╬════════╬════════════╬════════════╣ β•‘float512β•‘ 1 β•‘ ???? β•‘ ???? β•‘ β•šβ•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•β•β•β• 

My question is how to calculate the number of bits for the exponent and fractions with the total number of bits, e.g. 256, 512 or 1024.

+4
source share
3 answers

The earliest drafts of IEEE-754 (2008) defined guidelines for what exponential widths and significant fields of floating arbitrary widths should be. This was not a strict requirement, but simply a recommended practice. It was considered that this was too cumbersome for minimal benefits, so it was generally excluded from the standard and replaced by:

Language standards should define mechanisms that support extensible accuracy for each supported base. Language support Extensible precision should allow users to specify p and emax. Language standards also allow extensible precision to be specified by specifying only p; in this case emax is determined by the locale should be at least 1000 Γ— p when p β‰₯ 237 bits in binary format or p β‰₯ 51 digits in decimal format.

(3.7 Extended and expandable fixes, p14).

However, the standard still defines (unnecessarily) β€œinterchange formats” of each size of 32 bits larger than 128 in the tables in section 3.6 (p13). In particular, the binary format of width k has the exponent round(4*log2(k)) - 13 bits. For a specific case k=256 this gives:

 exponent: round(4*log2(256)) - 13 = 32 - 13 = 19 significand: 256 - 1 - 19 = 236 

For the 384-bit wide format following this formula, the exponent width will be:

 round(4*log2(384)) - 13 = round(34.339850002884624) - 13 = 21 bits 

Keep in mind that for arbitrary precision floating point arithmetic, there are many packages that do not meet these guidelines. This is just a definition of the "binary256" exchange format, and not what any particular implementation necessarily uses.

+7
source

There is no 256-bit double value in IEEE 754-2008 floating point.

The number of bits in the formats is not calculated, they are randomly selected to give a certain accuracy and range. If you want to create your own 256-bit floating-point number format, you can simply choose the sizes that give you the precision and range you want.

0
source

The values ​​in the table are specified in the IEEE 754-2008 standard , which reaches 128 bits. If you have hardware or software that implements floating point even more bits, you need to consult its documentation.

0
source

All Articles