The answer is absolutely not surprising: in fact
In [1]: -5768830964305142685L & 0xffffffff Out[1]: 1934711907L
so if you want reliable answers in ASCII strings , just get the lower 32 bits as uint . The hash function for strings is 32-bit and almost portable.
On the other hand, you cannot completely rely on having hash() any object for which you have not explicitly defined the __hash__ method, which should be invariant.
For ASCII strings, it works only because the hash is calculated using single characters forming a string, for example:
class string: def __hash__(self): if not self: return 0 # empty value = ord(self[0]) << 7 for char in self: value = c_mul(1000003, value) ^ ord(char) value = value ^ len(self) if value == -1: value = -2 return value
where the function c_mul is a "cyclic" multiplication (without overflow), as in C.
rewritten Oct 20 2018-10-10 16:02
source share