Should I override hashCode () collections?

Given that I have a class with various fields in it:

class MyClass { private String s; private MySecondClass c; private Collection<someInterface> coll; // ... @Override public int hashCode() { // ???? } } 

and I have different objects that I would like to store in a HashMap . For this I need to have hashCode() of MyClass .

  • I need to go into all fields and the corresponding parent classes recursively to make sure that they all implement hashCode() correctly, because otherwise, hashCode() of MyClass may not take into account some values. Is it correct?

  • What should I do with this Collection ? Can I always rely on my hashCode() method? Will it take into account all child values ​​that may exist in my someInterface object?


I OPENED THE SECOND QUESTION regarding the real problem of the unique IDing of an object here: How do I create a (almost) unique hash identifier for objects?


Clarification:

is there anything more or less unqiue in your class? String s? Then use it as hashcode.

MyClass hashCode () of the two objects should definitely be different if either of the values ​​in the coll one of the objects is changed. HashCode should return only one value if all fields of two objects keep the same values, resursively. Basically, a lot of time is going on for the MyClass object. I want to get rid this time if the calculation was already done with the same values ​​some time ago. For this, I would like to find in HashMap if the result is already available.

Do you use MyClass in HashMap as a key or as a value? If key, you must override both equals () and hashCode ()

Thus, I use hashCode OF MyClass as the key in the HashMap. The value (result of the calculation) will be something else, for example Integer (simplified).

What do you think equality means for several collections? Should it depend on the ordering of the elements? Should it depend only on the absolute elements that are present?

Doesn't it depend on the type of collection that is stored in coll ? Although I think ordering is not very important, no

The answer you get from this site is great. Thanks to everyone.

@AlexWien, which depends on whether these collection items are part of the class equivalence definition or not.

Yes, yes, they are.

+6
source share
3 answers
  • I need to go into all the fields and the corresponding parent classes recursively to make sure that they all implement hashCode() correctly, because otherwise hashCode() of MyClass may not take into account some values. Is it correct?

It is right. This is not as burdensome as it seems, because the rule of thumb is that you need to override hashCode() if you override equals() . You do not need to worry about classes that use the default equals() ; the default hashCode() will suffice for them.

Also, for your class, you only need the hash of the fields that you are comparing in your equals() method. For example, if one of these fields is a unique identifier, you can simply check this field in equals() and hash it in hashCode() .

All of this is based on the fact that you also override equals() . If you have not redefined it, do not worry with hashCode() .

  1. What should I do with this Collection ? Can I always rely on my hashCode() method? Will it take into account all child values ​​that may exist in my someInterface object?

Yes, you can rely on any type of collection in the standard Java library for the correct implementation of hashCode() . And yes, any List or Set will consider its contents (it will mix the hash codes of the elements).

+6
source

So, you want to make a calculation based on the contents of your object, which will give you a unique key, you can check the HashMap if there will be a “heavy” calculation, which you do not want to do twice already for this deep combination of fields.

hashCode only:

I believe that hashCode not suitable for use in the scenario you are describing.

hashCode should always be used with equals() . This is part of his contract, and this is an important part, because hashCode() returns an integer, and although you can try to distribute hashCode() as much as possible, it will not be unique for every possible object in the same class, except in very specific cases (easy for Integer , Byte and Character , for example ...).

If you want to see for yourself, try creating strings up to 4 letters long (lower and upper case) and see how many of them have the same hash codes.

HashMap therefore uses the hashCode() and equals() methods when it searches for things in a hash table. There will be elements that have the same hashCode() , and you can only tell if it is the same element or not by testing all of them using equals() for your class.

Using hashCode and equals together

In this approach, you use the object itself as a key in the hash map and assign the corresponding equals method to it.

To implement the equals method, you need to deeply examine all the fields. All their classes should have equals() , which corresponds to what you consider equal for the sake of your big calculation. Extra care must be taken when your objects implement the interface. If the calculation is based on calls to this interface, and different objects that implement the interface return the same value in these calls, then they should implement equals in a way that reflects this.

And their hashCode must match equals - when the values ​​are equal, hashCode must be equal.

Then you create your equals and hashCode based on all of these elements. You can use Objects.equals(Object, Object) and Objects.hashCode( Object...) to save a lot of templates.

But is this a good approach?

As long as you can cache the result of hashCode() in an object and reuse it without computing, unless you mutate it, you cannot do this for equals . This means that calculating equals will be lengthy.

Therefore, depending on how many times the equals() method is called for each object, this will be exacerbated.

If, for example, you have 30 objects in the HashMap , but 300,000 objects go together and compare with them just to understand that they are equal to them, you will earn 300,000 heavy comparisons.

If you only have a few instances in which the object will have the same hashCode or fall into the same bucket in the HashMap , requiring comparison, then the equals() transition may work well.

If you decide to go this way, you need to remember:

If an object is a key in a HashMap , it should not be mutated , if any. If you need to change it, you may need to make a deep copy and save the copy on the hash map. Deep copying again requires consideration of all objects and interfaces inside to make sure that they can be copied at all.

Creating a unique key for each object

Back to the original idea, we found that hashCode not a good candidate for a key in a hash map. A better candidate for this would be a hash function like md5 or sha1 (or more advanced hashes like sha256, but you don't need cryptographic strength in your case), where collisions are much less common than just int . You can take all the values ​​in your class, convert them to an array of bytes, hash it using such a hash function, and take the value of the hexadecimal string as the map key.

Naturally, this is not a trivial calculation. Therefore, you need to consider whether it really saves you a lot of time on the calculation that you are trying to avoid. This will probably be faster than repeating the equals() call to compare objects, since you only do this once for each instance with the values ​​it had during the “big calculation”.

For a given instance, you can cache the result and not calculate it again if you do not mutate the object. Or you could just calculate it again just before doing a “big calculation."

However, you will need a “collaboration” of all the objects that you have in your class. That is, they should all be reasonably convertible to a byte array so that two equivalent objects produce the same bytes (including the same problem with the interface objects that I mentioned above).

You should also beware of situations in which you have, for example, two lines "AB" and "CD" that will give you the same result as "A" and "BCD", and then you will get the same hash for two different objects.

+2
source

From your clarifications:

You want to save MyClass in the HashMap key as. This means that after adding an object, hashCode () cannot be changed. Therefore, if your collections may change after you create the object, they should not be part of hashcode ().

From http://docs.oracle.com/javase/8/docs/api/java/util/Map.html

Note: Great care should be taken if mutable objects are used as key cards. The behavior of the map is not indicated if the value of the object changes in a way that affects equal comparisons, while the object is the key on the map.

For 20-100 objects, you should not introduce the risk of an inconsistent implementation of hash () or equals ().

There is no need to override hahsCode () and equals () in your case. If you do not override it, java accepts a unique object identifier for equals and hashcode () (and this works, epsecially, because you stated that you do not need equals (), given the values ​​of the fields of the object).

When using the default implementation, you are safe.

Running an error, like using custom hashcode () as a key in a HashMap, when the hash code changes after insertion, since you used the hash code () for collections as part of your object hash code, can lead to an extremely complex search error.

If you need to find out if the heavy calculation is finished, I would not have absue equals () . Just write your own objectStateValue() method and call hashcode () in the collection. This does not interfere with the hash code of the objects and is equal to ().

 public int objectStateValue() { // TODO make sure the fields are not null; return 31 * s.hashCode() + coll.hashCode(); } 

Another simpler option: a code that performs time calculations can increase the calculation by one as soon as the calculation is ready. Then you just check if the counter has changed. it is much cheaper and easier.

-1
source

All Articles