Hashcode calculation to multiply and ignore overflow bits?

This question is not about why one multiplies, which is pretty obvious about its distribution.

Why use a prime in hashCode?

But rather, it is more about one property of multiplication, which becomes more important, the more factors are included in the hash code calculation formula.

A simple calculation can obviously overwhelm, but that doesn't really matter.

a * 31 + b 

The real problem is when many elements are in the formula.

 ((a * 31) + b) * 31 ... 6n. 

After more than 5 or 6 terms include the value of the first term, it is lost because its bits overflow by the time the hash code reaches the inclusion of 5+. Using this system, only the last 5 or so terms are really significant for the final meaning.

 31 ^ 7 > Integer.MAX_VALUE 

So why do most calculations roll bits that overflow back, and xor w / lower bits of the result. I understand that this requires some bit, and the calculations must be done using longs (64 bits), so the top 32 bits can be XOR'd with the whole result, but at least no bits will be lost.

Is there any special reason why overflow is ignored? Its not so expensive to use a long one, as described earlier.

EDIT

 100000*31^7= 2751261411100000 0x9C641F717C560 6553600000*31^7 180306667837849600000 0xC641F717C5600000 

Note that the latter value is exactly 65,536 times the previous one, which also means that its response is 16 bits larger. Note that the integer value 0xC641F717C5600000 is 0xC5600000, the actual significant values ​​are lost from the 16-bit value.

 *SAMPLE A* 65536*4096*27512614111 =7385361114638319616 =0x667E12CDF0000000 12345678 =0xF0000000 *SAMPLE B* 9*65536*4096*27512614111 =66468250031744876544 =0x9A6EA93D70000000 12345678 =0x70000000 

Note that the topmost bit of SAMPLE B, which is exactly 9x SAMPLE A, is practically no different from the final 32-bit value - if I changed 9x to 17x, then the lower bits would be identical. However, if the upper most bits were not "lost" due to overflow and xord with lower 32 bits, then the value will be different.

+7
source share
3 answers

This is the advantage of multiplying by an odd number; earlier numbers do not fall completely from an integer. For the element to be lost, 31^n must be power 2, and this cannot be. In your case, for example, 31^7 you get 0x67E12CDF for a 32-bit number; thus, an input element multiplied by this value will still contribute to the result, despite the overflow.

+3
source

Is there any special reason why overflow is ignored? Its not so expensive to use a long one, as described earlier.

But there is almost no benefit from this. This methodology usually provides a good distribution of values ​​to begin with.

+2
source

I do not see the point in the examples. It seems to me that they are not related to the way you calculate the hash codes: a * 31 + b .

You might possibly find some a and b 's that would provide the same hash code (but where the high bits differ). Then it would be advisable for the cropping bits to return to the hash code.

Or another example would be for ((a * 31) + b )*31 + ... + z . Then find some a , b , ..., z , where the hash code no longer depends on a . So a will not be a significant contributor.

Of course, if you change 31 to 65536 , it's pretty easy to find those a , ..., z . Any value will be executed, all bits a simply fall away, a is shifted to the left and trimmed. But can you do it for 31 ? Or it looks like you could add fragile bits. But why? Can you find a case where this helps?

The problem with 65536 is that in binary it looks like 10000000000000000 . That way, when you multiply a number by it, in binary code there will again be these 16 zeros. For 31 , 11111 in binary format this will not happen.

Oh, I don’t mean that these examples do not exist because they do (this is just a hash after all). But you will not find many or similar examples.

0
source

All Articles