Can someone explain how the jash hashMap hash () function?

Question

Can someone explain how the jash hashMap hash () function?

after I read the JDK source code, I find the HashMap hash() function seems funny. His sucra code is as follows:

  static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }

The h parameter is the hash code from Objects that was placed in the HashMap . How does this method work and why? Why can this method be protected from weak hashCode functions?

+6

java hashmap hash

Cloude lee Jan 22 '13 at 6:57

source share

1 answer

qsys · Accepted Answer · 2013-01-23T12:38:24+0000

Hashtable uses the “classic” prime approach: to get the “index” of a value, you take the key hash and execute the module against the size. Taking a prime number as a size gives (usually) a nice index spread (depending on the hash, of course).

HashMap uses the "power of two" approaches, that is, the dimensions are two. The reason is that it should be faster than calculating primes. However, since the power of two is not a prime, there would be more collisions, especially with hash values having the same low-order bits.

Why? A module executed against size to get the (bucket / slot) index is simply computed with: hash and (size-1) (this is what HashMap uses to get the index!). There is basically a problem with the force-two approach: if the length is limited, for example, 16, the default value for the HashMap, only the last bits are used and, therefore, hash values with the same low bits will lead to the same (slave) index. In case 16, only the last 4 bits are used to calculate the index.

That's why an extra hash is computed and basically it shifts higher bit values and works with them with lower bit values. The reason for the numbers 20, 12, 7 and 4, I really don't know. They were different (in Java 1.5 or so, the hash function was slightly different). I suppose there is more advanced literature. You can find more information about why they use numbers, which they use in all kinds of literature related to the algorithm, for example.

http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming

http://mitpress.mit.edu/books/introduction-algorithms

http://burtleburtle.net/bob/hash/evahash.html#lookup uses different algorithms depending on the length (which makes sense).

http://www.javaspecialists.eu/archive/Issue054.html is probably interesting. Check out Joshua Bloch’s reaction at the bottom of the article: “The secondary secondary hash function (which I developed using a computer) has strong statistical properties that pretty much guarantee good bucket distribution.”) So, if you ask me, the numbers come from which An analysis performed by Josh himself, probably with the help of which someone knows who.

So: the power of the two gives a faster calculation, but the need for additional hash calculation in order to have a pleasant spread over the slots / buckets.

Can someone explain how the jash hashMap hash () function?

More articles: