How does hashMap in java fill up when load factor is greater than 1?

I tried to create a HashMap with the following details: -

HashMap<Integer,String> test = new HashMap<Integer,String>(); test.put(1, "Value1"); test.put(2, "Value2"); test.put(3, "Value3"); test.put(4, "Value4"); test.put(5, "Value5"); test.put(6, "Value6"); test.put(7, "Value7"); test.put(8, "Value8"); test.put(9, "Value9"); test.put(10, "Value10"); test.put(11, "Value11"); test.put(12, "Value12"); test.put(13, "Value13"); test.put(14, "Value14"); test.put(15, "Value15"); test.put(16, "Value16"); test.put(17, "Value17"); test.put(18, "Value18"); test.put(19, "Value19"); test.put(20, "Value20"); 

and I saw that each entrance was placed in a different bucket. This means that a different hash code was calculated for each key. Now, if I change my code as follows: -

 HashMap<Integer,String> test = new HashMap<Integer,String>(16,2.0f); test.put(1, "Value1"); test.put(2, "Value2"); test.put(3, "Value3"); test.put(4, "Value4"); test.put(5, "Value5"); test.put(6, "Value6"); test.put(7, "Value7"); test.put(8, "Value8"); test.put(9, "Value9"); test.put(10, "Value10"); test.put(11, "Value11"); test.put(12, "Value12"); test.put(13, "Value13"); test.put(14, "Value14"); test.put(15, "Value15"); test.put(16, "Value16"); test.put(17, "Value17"); test.put(18, "Value18"); test.put(19, "Value19"); test.put(20, "Value20"); 

I found that some of the values ​​that were placed in different buckets are now placed in a bucket that already contains some values, even if their hash value is different. Can someone help me understand the same thing?

thanks

+4
source share
4 answers

So, if you initialize the HashMap without specifying the initial size and load factor, it will be initialized with a size of 16 and a load factor of 0.75. This means that when the HashMap is at least (initial size * load factor) large, so 12 elements are large, it will be rephrased, which means it will grow about twice as much, and all elements will be added again.

Now you set the load factor to 2, which means that now the map will only be redrawn when it is filled with at least 32 elements.

Now what happens is that items with the same hash mod bucketcount will be placed in the same bucket. Each bucket with more than one item contains a list in which all items are placed. Now, when you try to find one of the elements, it first finds the bucket using the hash. Then he must iterate over the entire list in this bucket to find a hash with an exact match. It is quite expensive.

So, at the end there is a trade-off: paraphrasing is quite expensive, so you should try to avoid this. On the other hand, if you have several items in the bucket, the search becomes quite expensive, so you should also try to avoid this. So you need a balance between the two. Another way is to set the initial size high enough, but it takes up more memory that is not used.

+7
source

In your second test, the initial capacity is 16 and the load factor is 2. This means that the HashMap will use an array of 16 elements to store records (i.e. there are 16 codes), and this array will only change when the number of records on the map reaches 32 (16 * 2).

This means that some keys with different hash codes must be stored in the same bucket, as the number of buckets (16) is less than the total number of records (20 in your case).

The purpose of the key in the bucket is calculated in 3 stages:

  • The hashCode method is called first.
  • Then, an additional function is applied to hashCode to reduce the damage that may be caused by poor hashCode implementations.
  • Finally, the module result is applied to the result of the previous step to get a number between 0 and capacity - 1 .

The third step is that keys that have different hashCode can end up in the same bucket.

+3
source

Lets check it with examples -

i) In the first case, the load factor is 0.75f, and the initialCapacity is 16, which means that the size of the array will take place when the number of buckets in the HashMap reaches 16 * 0.75 = 12.

Now each key has a different HashCode , so the HashCode modulo 16 is unique, which leads to the fact that all the first 12 records will be moved to different buckets, after which there is a change in size and when adding new records they also fall into different buckets ( HashCode modulo 32 is unique.)

ii) In the second case, the load factor is 2.0f, which means that resizing will happen when not. buckets reaches 16 * 2 = 32. You continue to enter records on the card and never change sizes (for 20 records), which leads to a conflict of several records.

So, in a nutshell in the first example - HashCode modulo 16 for the first 12 entries and HashCode modulo 32 for all entries is unique, and in the second case, always HashCode modulo 16 for all entries that are not unique (cannot be like all 20 entries should be placed in 16 buckets)

+2
source

Javadoc explanation:

A HashMap instance has two parameters that affect its performance: initial power and load factor. Capacity is the number of buckets in the hash table, and the initial capacity is just at the time the hash table was created. A load factor is a measure of how full a hash table is allowed before capacity automatically increases. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rephrased (that is, the internal structure data is rearranged), so the hash table has approximately twice the number of buckets.

Typically, the default load factor (.75) offers a good compromise between time and space. Higher values ​​reduce overhead, but increase the cost of the search (reflected in most operations of the HashMap class, including get and put). The expected number of entries on the map and load factor should be taken into account when establishing the initial capacity in order to minimize the number of paraphrase operations. If the initial capacity is greater than the maximum number of records divided by the load factor, without reinstalling the operation will ever occur.

By default, the initial capacity is 16 and the load factor is 0.75. Therefore, when the number of records goes beyond 12 (16 * 0.75) , its capacity increases to 32 and the hash table is re-displayed. That is why in your first case, each individual element has its own bucket.

In your second case, only when the number of records crosses 32(16*2) , the hash table will be changed. Even if elements have different hash values ​​when hashcode% bucketsize is calculated, it may collide. It is for this reason that you see more than one item in one bucket.

0
source

All Articles