Is String.hashCode () portable across virtual machines, JDKs and OS?

An interesting problem has recently appeared. We came across some code that uses hashCode() as a source of salts for MD5 encryption, but this begs the question: will hashCode() return the same value for the same object on different virtual machines, different versions of JDK and operating systems? Even if this is not guaranteed, has it changed at any moment so far?

EDIT: I really mean String.hashCode() , not the more general Object.hashCode() , which of course can be overridden.

+6
java hashcode
source share
5 answers

Not. From http://tecfa.unige.ch/guides/java/langspec-1.0/javalang.doc1.html :

The general hashCode contract is as follows:

  • Whenever it is called on the same object more than once during the execution of a Java application, hashCode must consistently return the same integer. An integer can be positive, negative or zero. This integer, however, should not remain consistent with one Java application to another or from one execution of an application to another execution of the same application. [...]
+8
source share

It depends on the type:

  • If you have a type that hashCode () has not overridden, it will probably return a different hash code () each time the program starts.
  • If you have a type that overrides hashCode () but does not document how it is calculated, it is completely legitimate for an object with the same data to return a different hash each time it starts, if it returns the same hash for repeated calls within the same and the same launch.
  • If you have a type that overrides hashCode () in a documented way, that is, the algorithm is part of the documented behavior, then you are probably safe. (e.g. java.lang.String documents this.) However, I still avoided relying on it in a general way personally.

Just a warning story from the .NET world. I saw at least a few people in the world of pain, using the result of string.GetHashCode () as my password hash in the database. The algorithm has changed between .NET 1.1 and 2.0, and suddenly all the hashes are "wrong." (Jeffrey Richter documents an almost identical case in the CLR via C #.) When a hash needs to be stored, I would prefer it to be calculated in a way that is always guaranteed to be stable - for example, MD5 or the user interface implemented by your types with a guarantee of stability.

+3
source share

According to docs : the hash code for the String object is calculated as

 s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] 

I'm not sure if this is a formal specification or just an implementation of Sun. At the very least, it should be the same for all existing Sun virtual machines, regardless of platform or operating system.

+2
source share

Not. Hash algorithms are not guaranteed unless otherwise indicated. For example, to deserialize hash structures, hash codes must be recalculated, and these values ​​should not be stored in serialized form.

+1
source share

I would like to add that you can override hashCode () (don't forget equals () if you do) to make sure your business objects everywhere return the same hash code. Then these objects will at least have a predictable hash code.

0
source share

All Articles