NAN Distribution and IEEE 754 Standard

I am developing a new microprocessor instruction set ( www.forwardcom.info ) and I want to use the NAN distribution to track errors. However, there are a few oddities in the IEEE 754 floating point standard that prevent this.

Firstly, the reason why I want to use NAN propagation rather than error capturing is because I have variable-length vector registers. If, for example, I have a float vector with 8 elements, and I have 1/0 in the first element and 0/0 in the sixth element, then I get only one trap, but if I run the same program on the computer and a half the length of the vector, then I get two traps: one for infinity and one for NAN. I want the result not to depend on the length of the vector, so I need to rely on the distribution of NAN and INF, and not on the capture. The NAN and INF values ​​will be propagated through computations so that they can be verified in the final result. The NAN view contains some bits called payloads that can be used to provide information about the source of the error.

However, there are two problems with the IEEE 754 floating point standard that prevent reliable propagation of NAN values.

The first problem is that a combination of two NANs with different payloads is just one of two values. For example, NAN1 + NAN2 gives NAN1. This violates the fundamental principle: a + b = b + a. The compiler can exchange operands to get different results for different compilers or with different optimization options. I prefer to get a bitwise OR combination of two payloads. This will work if you have one bit for each error condition, but certainly not if the payload contains more complex information (for example, NAN boxing in languages ​​with dynamic types). The standards committee actually discussed the OR'ing solution (see http://grouper.ieee.org/groups/754/email/msg01094.html ). I do not know why they rejected this offer.

The second problem is that the min and max functions do not propagate the NAN if only one of the inputs is a NAN. In other words, min (1, NAN) = 1. Reliable NAN propagation will of course require that min (1, NAN) = NAN. I have no idea why the standard talks about this.

In a new microprocessor system called ForwardCom, I want to avoid these unfortunate quirks and point out that NAN1 + NAN2 = NAN1 | NAN2 and min (1, NAN) = NAN.

And now to my questions: First, do I need an option switch to change between strict IEEE compliance and reliable NAN propagation? Quoting the standard:

Silent NaNs should, at the discretion of the developers, provide retrospective diagnostic information inherited from invalid or inaccessible data and results. To facilitate the dissemination of the diagnostic information contained in NaNs, as much of this information as possible should be stored in the results of NaN operations.

Note that the standard states “should” here, where it “should” elsewhere. Does this mean that my deviation from the recommendation is acceptable?

And the second question: I can not find examples where the distribution of NAN is actually used to track errors. Perhaps this is due to the weaknesses of the standard. I want to define different payload bits for different error conditions, for example:

  • 0/0, 0 * ∞, ∞ / ∞, modulo (1,0), modulo (∞, 1), ∞ -∞ and other errors related to infinity and division by zero.

  • sqrt (-1), log (-1), pow (-1,0.1) and other errors arising from logarithms and degrees.

  • asin (2) and other mathematical functions.

  • Explicit purpose. This can be useful when the variable is initialized to the NAN.

There are many free bits for custom error codes.

Was it done before, or do I need to invent all this from scratch? Are there any issues that I have to consider (except for NAN boxing in some languages)

+7
c ++ floating-point ieee-754 nan
source share
4 answers

Yes, you are allowed to deviate from "should". From the specification (§1.6):

- may indicate the course of action allowed within the standard without implied preference ("may" means "allowed")

- must indicate mandatory requirements that are strictly observed to comply with the standard and from which deviation is not allowed ("must" means "required")

- must indicate that among several possibilities one is recommended as particularly suitable, without mentioning or excluding the others; or that a specific course of action is preferable, but not necessarily necessary; or that (in negative form) a certain course of action is outdated but not prohibited ("should" means "recommended").

Regarding the min behavior, the Intel implementation is also different from the IEEE specification. From the Intel instruction set reference for MINSD :

If the value in the second source operand is SNaN, then the SNaN is returned unchanged to the destination (that is, the QNaN version of the SNaN is not returned).

If only one NaN value (SNaN or QNaN) is used for this command, then the second operand of the source, either NaN or a real floating point value, is written to the result. If instead of this behavior it is required that the operand of the NaN source (from the first or second source), the MINSD action can be emulated using a sequence of instructions, such as a comparison, followed by AND, ANDN and OR.

In other words, does this correspond to x < y ? x : y x < y ? x : y . I'm not really sure what specific sequence they mean, but there is an alternative approach suggested here https://github.com/JuliaLang/julia/issues/7866#issuecomment-51845730 .

+3
source share

Some thoughts:

The second problem is that the min and max functions do not propagate the NAN if only one of the inputs is a NAN. In other words, min (1, NAN) = 1. Reliable NAN propagation will of course require that min (1, NAN) = NAN. I have no idea why the standard talks about this.

The current draft for the next revision of IEEE 754 contains both the NaN-favored minimum and maximum and the favored-number minimumNumber and maximumNumber . This means that the application will be able to choose what suits it, but your set of instructions will need to support both options if you intend to ensure its compliance. (Pay attention to “support,” not “implementation.” For a set of instructions, there is no need to directly implement the IEEE 754 operations in separate instructions to allow the computing platform to conform to IEEE 754 - it just needs to provide instructions from which the corresponding platform can be built This is normal if the IEEE 754 operation requires several instructions or support from the operating system or libraries.)

And now to my questions: first, do I need an option switch to switch between strict IEEE compliance and reliable NAN propagation?

Since returning is standard, you don’t need to return the recommended NaN to confirm compliance. However, minimum(1, NaN) should return NaN.

Of course, you do not need to do this using a switch, and the state of the environment is not welcome due to its drag and drop on performance. The choice between the behavior can be made using different instructions or different inputs of the instructions through an additional register or additional bits that accompany the contents of a regular register.

And the second question: I can’t find examples where NAN propagation is actually used to track errors.

I remember at least one IEEE 754 committee member using the NaN payload, but I don’t remember who or the details.

+1
source share

Regarding the addition of two NANs. When you add two NANs with different payloads, you get only one of them, usually the first. This makes a + b different from b + a, which is unacceptable because the compiler can exchange operands. Above, I suggested returning a bitwise OR combination of two payloads. Thinking about this, there is another possible solution: return the largest of the two payloads.

The advantage of "OR" has the advantage of being simple. The disadvantage is that it limits the useful information that you can have in the payload to one bit for each possible error condition. This would still be very useful, since the number of different events that NaN can generate is less than the number of bits of the payload.

The second solution, in which you return the largest of the two payloads, requires a bit more hardware. The advantage is that you can have more detailed information in the payload, possibly including information about where the error occurred. The downside is that you only distribute information about the worst of two errors. This solution is fully compatible with the current standard. Newer processors can implement this without the need for switching for backward compatibility.

0
source share

Just to add to this discussion, the ieee standard explicitly allows this error coding flexibility in Nan, but indicates that this should be done using implementations of the programming language, not the hardware level. However: I really like to maintain the semantics of nanometer poisoning using bitwise or semantics at the hardware level. I studied adding the same semantics to the ghc Haskell compiler.

However, I believe that the semantics of capturing semantics / signaling will still be useful. In many programming languages ​​/ programs, the set of allowed traps can be considered as intermittent exceptions in the basic calculation. This means that a different platform issue about whether one of the two errors in the tandem was reported does not change the "value" of the local calculation. (And in fact, it can be argued that many high-level programming languages ​​will benefit from support for signaling processing as exceptions. Which seems to be mostly absent)

0
source share

All Articles