Why do hash maps in Java 8 use a binary tree instead of a linked list?

I recently found out that Java 8 hash maps use a binary tree instead of a linked list, and the hash code is used as a branching factor. I understand that in the event of a high collision, the search is reduced to O (log n) from O (n) using binary trees. My question is what does this well, since the amortized time complexity is still O (1), and maybe if you force all the records to be stored in the same bucket by providing the same hash code for all keys we see a significant time difference, but no one in their right mind will do this.

The binary tree also uses more space than a singly linked list, since it stores both the left and right nodes. Gradually increase the complexity of the space when there is absolutely no improvement in temporal complexity, with the exception of some false test cases.

+7
java hashmap java-8 theory
source share
2 answers

This is mainly a security change. Although in a normal situation there is rarely a lot of conflict, if the hash keys come from an unreliable source (for example, the names of HTTP headers received from the client), then it is probably not very difficult to specially process the input, so the keys received will have the same hash code . Now, if you are performing a lot of requests, you may face a denial of service. It seems that in the wild there are quite a lot of code that is vulnerable to this kind of attack, so it was decided to fix it on the Java side.

See JEP-180 for details.

+15
source share

There are incorrect assumptions in your question.

  • A collision with a bucket is not necessarily a hash collision. You do not need to have the same hash code for two objects so that they end up in the same bucket. A bucket is an element of an array, and the hash code must be mapped to a specific index. Since the size of the array must be reasonable in relation to the size of Map s, you cannot arbitrarily raise the size of the array to avoid collision conflicts. Theres even a theoretical limitation that the size of the array can be at max. 2Β³ΒΉ, while there are 2Β² possible hash codes.
  • Having a hash collision is not a sign of poor programming. For all objects with a value space of more than 2 mΒ², the possibility of individual objects with the same hash code is inevitable. String is an obvious example, but even a Point that carries two int values ​​or a regular Long key has inevitable hash collisions. Thus, they may be more common than you think, and it depends a lot on the use case.
  • The implementation switches to the binary tree only when the number of collisions in the bucket exceeds a certain threshold, so higher memory costs are applied only when they pay off. there seems to be a general misunderstanding about how they work. Since the bucket collision is not necessarily related to hash collisions, the binary search will first look for the hash codes. Only when the hash codes are identical and the key properly implements Comparable , will its natural order be used. The examples you may have found on the Internet intentionally use the same hash code for objects that demonstrate using a Comparable implementation that would not otherwise display. What they cause is only the last resort.
  • As Tagir noted , this problem can affect software security, since slow backups may open up DoS attacks. There have been several attempts in previous versions of the JRE to solve this problem, which had more drawbacks than binary tree memory consumption. For example, an attempt was made to randomize the mapping of hash codes to array entries in Java 7, which caused initialization overhead, as described in this error report . This does not apply to this new implementation.
+8
source share

All Articles