Python hash () built-in

Windows XP, Python 2.5:

hash('http://stackoverflow.com') Result: 1934711907 

Google App Engine ( http://shell.appspot.com/ ):

 hash('http://stackoverflow.com') Result: -5768830964305142685 

Why? How can I use a hash function that will give me the same results on different platforms (Windows, Linux, Mac)?

+80
python google-app-engine hash
Apr 27 '09 at 14:31
source share
11 answers

Use hashlib since hash() was intended to be used to :

quickly compare dictionary keys while searching for a dictionary

and therefore does not guarantee that it will be the same for the Python implementation.

+55
Apr 27 '09 at 14:33
source share

As stated in the documentation, the hash () built-in function is not intended to store received hashes somewhere outside. It is used to provide the hash value of an object, to store them in dictionaries, etc. It is also implementation specific (GAE uses a modified version of Python). Departure:

 >>> class Foo: ... pass ... >>> a = Foo() >>> b = Foo() >>> hash(a), hash(b) (-1210747828, -1210747892) 

As you can see, they are different because hash () uses the object __hash__ method instead of "normal" hash algorithms such as SHA.

Given the above, a rational choice is to use the hashlib module.

+88
Apr 27 '09 at 14:43
source share

The answer is absolutely not surprising: in fact

 In [1]: -5768830964305142685L & 0xffffffff Out[1]: 1934711907L 

so if you want reliable answers in ASCII strings , just get the lower 32 bits as uint . The hash function for strings is 32-bit and almost portable.

On the other hand, you cannot completely rely on having hash() any object for which you have not explicitly defined the __hash__ method, which should be invariant.

For ASCII strings, it works only because the hash is calculated using single characters forming a string, for example:

 class string: def __hash__(self): if not self: return 0 # empty value = ord(self[0]) << 7 for char in self: value = c_mul(1000003, value) ^ ord(char) value = value ^ len(self) if value == -1: value = -2 return value 

where the function c_mul is a "cyclic" multiplication (without overflow), as in C.

+32
Oct 20 2018-10-10
source share

Most of the answers suggest that this is due to different platforms, but there is more to it. From the documentation of object.__hash__(self) :

By default, the __hash__() str , bytes and datetime objects are salty with an unpredictable random value. Although they remain constant within a separate Python process, they are not predictable between Python repeated calls.

This is intended to provide protection against denial of service caused by carefully selected inputs that use the worst case insert performance form, complexity O (nΒ²). See http://www.ocert.org/advisories/ocert-2011-003.html for details.

Changing the hash value affects the iteration order of dicts , sets and other mappings. Python never provided a guarantee about this (and usually it depends on the 32-bit and 64-bit builds).

Even working on the same computer will give different results when called:

 $ python -c "print(hash('http://stackoverflow.com'))" -3455286212422042986 $ python -c "print(hash('http://stackoverflow.com'))" -6940441840934557333 

While:

 $ python -c "print(hash((1,2,3)))" 2528502973977326415 $ python -c "print(hash((1,2,3)))" 2528502973977326415 



See also the PYTHONHASHSEED environment PYTHONHASHSEED :

If this variable is not set or set to random , a random value is used for the hash seeds of str , bytes and datetime objects.

If PYTHONHASHSEED set to an integer value, it is used as a fixed seed to generate hash() types covered by the randomization hash.

Its purpose is to allow repeated hashing, for example, for the interpreter itself or to allow the python process cluster share hash values.

The integer must be a decimal number in the range [0, 4294967295] . Specifying a value of 0 will disable hash randomization.

For example:

 $ export PYTHONHASHSEED=0 $ python -c "print(hash('http://stackoverflow.com'))" -5843046192888932305 $ python -c "print(hash('http://stackoverflow.com'))" -5843046192888932305 
+16
Nov 17 '15 at 17:29
source share

The results of hashes vary between 32-bit and 64-bit platforms.

If the computed hash should be the same on both platforms, use

 def hash32(value): return hash(value) & 0xffffffff 
+8
Mar 29 2018-11-11T00:
source share

Assuming AppEngine uses a 64-bit Python implementation (-5768830964305142685 will not fit in 32 bits), and your Python implementation is 32 bits. You cannot rely on hashes of objects that are significantly comparable between different implementations.

+6
May 26 '10 at 12:58 a.m.
source share

This is the hash function that Google uses in production for python 2.5:

 def c_mul(a, b): return eval(hex((long(a) * b) & (2**64 - 1))[:-1]) def py25hash(self): if not self: return 0 # empty value = ord(self[0]) << 7 for char in self: value = c_mul(1000003, value) ^ ord(char) value = value ^ len(self) if value == -1: value = -2 if value >= 2**63: value -= 2**64 return value 
+6
Feb 20 2018-12-12T00:
source share

What about the sign bit?

For example:

Hex value 0xADFE74A5 represents unsigned 2919134373 and is signed -1375832923 . The currect value must be signed (bit sign = 1), but python converts it as unsigned, and we have the wrong hash value after translating from 64 to 32 bits.

Be careful using:

 def hash32(value): return hash(value) & 0xffffffff 
+5
Jan 13 2018-12-15T00:
source share

Polynomial hash for strings. 1000000009 and 239 are arbitrary primes. Collisions are unlikely to happen by accident. Modular arithmetic is not very fast, but to prevent collisions it is more reliable than with its module with a power of 2 . Of course, it is easy to find a collision.

 mod=1000000009 def hash(s): result=0 for c in s: result = (result * 239 + ord(c)) % mod return result % mod 
+3
Sep 29 '14 at 18:00
source share

The value PYTHONHASHSEED can be used to initialize hash values.

Try:

 PYTHONHASHSEED python -c 'print(hash('http://stackoverflow.com'))' 
+2
Oct 19 '15 at 15:39
source share

He probably just asks for the operating system, not his own algorithm.

As other comments say, use hashlib or write your own hash function.

-3
Apr 27 '09 at 14:38
source share



All Articles