Resize Java HashMap

Question

Resize Java HashMap

Suppose we have some code

class WrongHashCode{ public int code=0; @Override public int hashCode(){ return code; } } public class Rehashing { public static void main(String[] args) { //Initial capacity is 2 and load factor 75% HashMap<WrongHashCode,String> hashMap=new HashMap<>(2,0.75f); WrongHashCode wrongHashCode=new WrongHashCode(); //put object to be lost hashMap.put(wrongHashCode,"Test1"); //Change hashcode of same Key object wrongHashCode.code++; //Resizing hashMap involved 'cause load factor barrier hashMap.put(wrongHashCode,"Test2"); //Always 2 System.out.println("Keys count " + hashMap.keySet().size()); } }

So my question is why, after resizing the hashMap (which, as I understand it, involves rewriting keys ), we still have 2 keys in keySet instead of 1 (since the key object is the same for both existing pairs of KB)?

+7

java hashmap

Vladyslav Nikolaiev Sep 05 '17 at 13:23

source share

5 answers

HashMap actually caches a hash code for each key (since a key hash code can be expensive to compute). So, although you changed the hashCode for the existing key, the Entry element to which it is bound in the HashMap still has the old code (and therefore, it gets into the “wrong” bucket after resizing).

You can see this for yourself in the jvm code for HashMap.resize () (or a little easier to see in java 6 code HashMap.transfer () ).

+7

jtahlborn Sep 05 '17 at 13:33

source share

I cannot find it clearly documented, but changing the key value in such a way that changing it hashCode() usually interrupts the HashMap .

HashMap divides entries between b buckets. You can imagine that a key with a hash h assigned to a bucket h%b . When he receives a new record, he issues which bucket he belongs to if an equal key already exists in this bucket. He finally adds it to the bucket, deleting any agreed key.

By changing the hash code, the wrongHashCode object will (as a rule, also here actually) be redirected to another bucket a second time, and its first record will not be found or deleted.

In short, changing the hash of an already inserted key interrupts the HashMap , and what you get after that is unpredictable, but can lead to (a) not detecting the key or (b) finding two or more identical keys.

+2

Persixty Sep 05 '17 at 13:40

source share

I cannot say why the two answers rely on HashMap.tranfer for some example, when this method is generally absent in java-8. So I provided my little input, considering java-8.

The entries in the HashMap indeed reissued, but not in the sense you might think of. The repeated hash basically redistributes the ones already provided (by you) from Key#hashcode ; There is a way to do this:

 static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }

So, basically, when you calculate your hash code, the HashMap will basically say “I don't trust you enough”, and it will re-hash your hash code and it might be better to allocate a bit (this is actually XOR first 16 bits and the last 16 bits).

On the other hand, when the HashMap changes size, it actually means that the number of silos / buckets doubles in size; and because bins always have two strengths - this means that the record from the current bunker will be: potential stay in the same bucket OR moving to a bucket that is offset with the current number of bins. You can find some details on how this is done in this matter .

So, once re-size occurs, there is no additional re-hashing; in fact, one more bit is taken into account, and thus the record can move or stay where it is. And Gray's answer is correct in this sense that each Entry has a hash field that is evaluated only once - the first time you put this Entry .

+2

Eugene Sep 06 '17 at 8:01

source share

Since the HashMap stores the elements in the internal table, and the code increment does not affect this table:

  public V put(K key, V value) { if (key == null) return putForNullKey(value); int hash = hash(key.hashCode()); int i = indexFor(hash, table.length); for (Entry<K,V> e = table[i]; e != null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; addEntry(hash, key, value, i); return null; }

And addEntry

  void addEntry(int hash, K key, V value, int bucketIndex) { Entry<K,V> e = table[bucketIndex]; table[bucketIndex] = new Entry<K,V>(hash, key, value, e); if (size++ >= threshold) resize(2 * table.length); }

As you can see table[bucketIndex] = new Entry (hash, ...) , so if you increase the code, it will not be displayed here.

Try to code the Integer field and see what happens?

0

ACV Sep 06 '17 at 10:35

source share

Gray · Accepted Answer · 2017-09-05T13:41:31+0000

So my question is why after resizing hashMap (which, as I understand it, involves rewriting keys)

This is actually not related to duplicate keys - at least not in the HashMap code, except in certain circumstances (see below). It includes moving them into cardboard buckets. Inside the HashMap is the Entry class, which has the following fields:

 final K key; V value; Entry<K,V> next; int hash;

The hash field is the stored hash code for the key, which is calculated when put(...) called. This means that if you change the hash code in your object, it will not affect the entry in the HashMap unless you move it to the map. Of course, if you change the hash code for the key, you won’t even be able to find it in the HashMap because it has a different hash code as a saved hash record.

do we still have 2 keys in keySet instead of 1 (since the key object is the same for both existing KV pairs)?

So, although you have changed the hash for one object, it is on a map with two entries with different hash fields.

All that is said is the code inside the HashMap that can redraw the keys when resizing the HashMap - see the secure package method HashMap.transfer(...) in jdk 7 (at least). This is why the hash field above is not final . It is used only when initHashSeedAsNeeded(...) returns true to use "alternative hashing". The following sets a threshold for the number of records in which alt hashing is activated:

 -Djdk.map.althashing.threshold=1

With this set in a virtual machine, I really can get hashcode() to call again when resizing happens, but I can't get the second put(...) be considered as rewriting. Part of the problem is that the HashMap.hash(...) method executes XOR with an internal hashseed that changes when resized, but after put(...) writes a new hash code for the incoming record.

Resize Java HashMap

More articles: