Will the double byte hash always be the same for the same value?

I think my question can also be asked as follows: can a single decimal value be represented in more than one way in a double-precision variable.

I have a hash table implementation, double-precision floating-point numbers will be keys, and I use a hash algorithm that builds a hash, iterate over each double byte (which is at least 64-bit on my system, so 8 bytes for the hash ) My problem is that if a single value, for example, "1.2345", can be represented in binary form in a double of more than 1 path, then this can lead to many possible hash values ​​for a single value.

I am not sure where to explore this possibility. If I were to guess, I would suggest that this is not possible, or that if it is possible, something normalizes it to ensure that the value always has the same idea in the given system. I am mainly looking for confirmation of this.

If a value can have multiple representations, then I will need to normalize the value before hashing it, and I will like suggestions on how to do this.

EDIT:

I learned a little more about floating point numbers. They are stored as a mantessa and an exhibitor. So my question is that a single floating point number can be represented by more than one combination of mantem and exponent.

+4
source share
3 answers

You will have two problems with byte hashing of floating point numbers.

The first, which unexpectedly arises, is that for 0 there are two representations, one with each possible sign bit. (0 and negative 0.) They are not only mathematically equal, but also necessary to check for equal values ​​in C / C ++. A negative 0 is unexpectedly detected in the calculations, although not all printf prints it as -0. (On Intel hardware, at least 0.0 / -1.0 - -0.0.)

So, you need to make sure that both zero hashes are the same.

Another problem is NaNs. There are many NaNs, but they are not (technically) comparable even to themselves, so they make lousy hash keys. Probably the easiest solution is to ignore them, because no one should expect NaN to be used as a hash key. But the problem is that if someone tries to put it in their hash table and then enters it again, it can be entered twice if you use floating point == to check for the key. Therefore, a simple error (or deliberate attack) can quickly run out of memory by expanding the hash table. (If you compare bytes with memcmp, you will not have this problem.)

+2
source

The IEEE 754 binary floating point has exactly one encoding for each represented value, except zero, for which there are +0 and a -0.

Most C implementations today use the IEEE 754 binary format. However, not all implement floating point operations correctly (for example, converting from a decimal number in a character string to a binary floating point can lead to a slightly inaccurate result), and the results of the operation chains may differ from the value which you get with exact arithmetic, and different chains can give different values, even if they would be the same mathematically. (This last one includes the result of compiling the same source code with different compilers that do not provide a strict floating point estimate.)

IEEE 754 also defines a decimal floating-point format and has several representations for values.

+2
source

It is possible. Double precision values ​​are more accurate than floating values, but still are not 100% airtight when it comes to accuracy. Use representations with a fixed or integer pair (i.e., an integer for each part of the value).

0
source

All Articles