In Java, why should equals () and hashCode () have to be consistent?

Question

In Java, why should equals () and hashCode () have to be consistent?

If I override any method in the class, it must make sure that if A.equals(B) = true , then (A.hashCode() == B.hashCode) must also be true.

Can someone show me a simple example where, if this is broken, it will cause a problem? I think this is because if you use this class as a key type for a Hashmap?

+6

java

Saobi Nov 05 '09 at 3:42

source share

6 answers

This is discussed in Element 8: Always override hashCode when overriding Joshua Bloch peers. Effective Java:

A common source of errors is failure to override the hashCode method. You must override hashCode in every class that overrides peers. Failure to do so will violate the general contract for Object.hashCode, disconnect your class from working correctly in combination with all collections, including HashMap, HashSet and Hashtable.
Here is the contract copied from the java.lang.Object specification:
Whenever it is called by the same object more than once during application execution, the hashCode method must consistently return the same integer if the information used in equal comparisons with the object does not change. This integer should not remain consistent with one execution of the application on another execution of the same application.
If two objects are equal in accordance with the equals (Object) method, then calling the hashCode method for each of the two objects should lead to the same integer result.
It is not required that if two objects are not equal according to the equals (Object) method, calling the hashCode method for each of the two objects should produce different integer results. However, the programmer should be aware that obtaining separate integer results for unequal objects can improve the performance of hash tables.
The key condition that is violated when you cannot override hashCode is the second: equal objects must have the same hash codes. Two different instances can be logically equal according to the class method is equal, but for the hashCode method of the class of objects, theyre only two objects with nothing more in common. Therefore, the hashCode method of objects returns two seemingly random instead of two equal numbers, as required by the contract.
For example, consider the following simplified PhoneNumber class, whose equals method is built according to the recipe in paragraph 7:
 public final class PhoneNumber { private final short areaCode; private final short exchange; private final short extension; public PhoneNumber(int areaCode, int exchange, int extension) { rangeCheck(areaCode, 999, "area code"); rangeCheck(exchange, 999, "exchange"); rangeCheck(extension, 9999, "extension"); this.areaCode = (short) areaCode; this.exchange = (short) exchange; this.extension = (short) extension; } private static void rangeCheck(int arg, int max, String name) { if (arg < 0 || arg > max) throw new IllegalArgumentException(name +": " + arg); } public boolean equals(Object o) { if (o == this) return true; if (!(o instanceof PhoneNumber)) return false; PhoneNumber pn = (PhoneNumber)o; return pn.extension == extension && pn.exchange == exchange && pn.areaCode == areaCode; } // No hashCode method! ... // Remainder omitted } 
Suppose you are trying to use this class with a HashMap:
 Map m = new HashMap(); m.put(new PhoneNumber(408, 867, 5309), "Jenny"); 
At this point, you can expect m.get(new PhoneNumber(408 , 867, 5309)) to return "Jenny" , but this returns null . Note that there are two instances of PhoneNumber: One is used to insert into the HashMap, and the second, equal, instance is used to (attempt) a search. The PhoneNumber class refuses to redefine hashCode causes two equal unequal hash codes, in violation of the hashCode contract. Consequently, the get method looks for the phone number in another hash bucket from the one in which it was stored. Solving this problem is simply like providing the correct hashCode method for the PhoneNumber class. [...]

For full content, see Chapter 3 .

+7

Pascal thivent Nov 05 '09 at 4:03

source share

Here is a small example:

 Set<Foo> myFoos = new HashSet<Foo>(); Foo firstFoo = new Foo(123,"Alpha"); myFoos.add(firstFoo); // later in the processing you get another Foo from somewhere Foo someFoo = //use imagination here...; // maybe you get it from a database... and it equal to Foo(123,"Alpha) if (myFoos.contains(someFoo)) { // maybe you win a million bucks. }

So imagine a hashCode created for firstFoo , 99999 , and it ends at a specific place in myFoos HashSet. Later, when you get someFoo and you look for it in myFoos HashSet, it needs to generate the same hash code so you can find it.

+1

dustmachine Nov 05 '09 at 3:54

source share

Containers like HashSet rely on a hash function to determine where to put it and where to get it when asked. If A.equals(B) , then the HashSet expects A to be in the same place as B If you put A in with a value of V and look at B , you should expect V to return back (since you said A.equals(B) ). But if A.hashcode ()! = B.hashcode (), then hashset may not find where you put it.

+1

Smashery Nov 05 '09 at 4:00

source share

This is precisely due to hash tables.

Due to the possibility of hash code collisions, the hash tables must also check the identification, otherwise the table cannot determine if it found the object that it was looking for, or one with the same hash code. Therefore, each get() in the hash table calls key.equals(potentialMatch) before returning the value.

If equals() and hashCode() are incompatible, you can get very inconsistent behavior. Say for two objects a and b , a.equals(b) returns true, but a.hashCode() != b.hashCode() . Insert a and the HashSet will return false for .contains(b) , but the list created from this set will return true (because the list does not use hash codes).

 HashSet set = new HashSet(); set.add(a); set.contains(b); // false new ArrayList(set).contains(b); // true

Obviously, this can be bad.

+1

Justin fagnani Nov 05 '09 at 4:05

source share

The idea is that two objects are “equal” if all their fields have the same value. If all fields have the same value, two objects must have the same hash value.

0

FrustratedWithFormsDesigner Nov 05 '09 at 3:48

source share

cletus · Accepted Answer · 2009-11-05T03:55:12+0000

Of course:

 public class Test { private final int m, n; public Test(int m, int n) { this.m = m; this.n = n; } public int hashCode() { return n * m; } public boolean equals(Object ob) { if (ob.getClass() != Test.class) return false; Test other = (Test)ob; return m == other.m; } }

from:

 Set<Test> set = new HashSet<Test>(); set.put(new Test(3,4)); boolean b = set.contains(new Test(3, 10)); // false

Technically, this should be true, because m == 3 in both cases.

In general, HashMap works as follows: it has a variable number, which is usually called "buckets." The number of buckets can change over time (when records are added and deleted), but always has a value of 2.

Let's say this HashMap has 16 buckets. When you call put () to add an entry, the hashCode () of the key is computed, and then the mask is taken depending on the size of the buckets. If you are (bitwise) and hashCode () with 15 (0x0F), you will get the last 4 bits equal to a number from 0 to 15 inclusive:

 int factor = 4; int buckets = 1 << (factor-1) - 1; // 16 int mask = buckets - 1; // 15 int code = key.hashCode(); int dest = code & mask; // a number from 0 to 15 inclusive

Now, if there is already a record in this bucket, you have what is called a collision. There are several ways to deal with this, but the one used by the HashMap (and probably the most common overall) is bucketing. All records with the same masked hash code are placed in a list.

So, to find if a given key is already on the map:

Calculate a masked hash code;
Find the appropriate bucket;
If it is empty, the key was not found;
If is is not empty, skip all entries in the bucket by checking equals ().

Looking through the bucket is a linear operation (O (n)), but it is on a small subset. The hashcode bucket definition is essentially constant (O (1)). If the buckets are small enough, then access to the HashMap is usually described as "near O (1)".

You can make a couple of comments about this.

Firstly, if you have a bunch of objects that all return 42 as their hash code, then the HashMap will work, but it will work like an expensive list. Access will be O (n) (since everything will be in one bucket regardless of the number of buckets). I was actually asked in an interview.

Secondly, returning to the starting point, if two objects are equal (i.e. a. equals(b) == b.equals(a) == true ) but have different hash codes, then the HashMap will look for (possibly) the wrong bucket, which leads to unpredictable and undefined.

In Java, why should equals () and hashCode () have to be consistent?

More articles: