How is __hash__ implemented in Python 3.2?

I want to create a custom hash method object (via etching). I could find the __hash__ algorithm for Python 2.x (see the code below), but it is clearly different from the hash for Python 3.2 (I wonder why?). Does anyone know how __hash__ implemented in Python 3.2?

 #Version: Python 3.2 def c_mul(a, b): #C type multiplication return eval(hex((int(a) * b) & 0xFFFFFFFF)[:-1]) class hs: #Python 2.x algorithm for hash from http://effbot.org/zone/python-hash.htm def __hash__(self): if not self: return 0 # empty value = ord(self[0]) << 7 for char in self: value = c_mul(1000003, value) ^ ord(char) value = value ^ len(self) if value == -1: value = -2 return value def main(): s = ["PROBLEM", "PROBLEN", "PROBLEO", "PROBLEP"]#, "PROBLEQ", "PROBLER", "PROBLES"] print("Python 3.2 hash() bild-in") for c in s[:]: print("hash('", c, "')=", hex(hash(c)), end="\n") print("\n") print("Python 2.x type hash: __hash__()") for c in s[:]: print("hs.__hash__('", c, "')=", hex(hs.__hash__(c)), end="\n") if __name__ == "__main__": main() 

 OUTPUT: Python 3.2 hash() bild-in hash(' PROBLEM ')= 0x7a8e675a hash(' PROBLEN ')= 0x7a8e6759 hash(' PROBLEO ')= 0x7a8e6758 hash(' PROBLEP ')= 0x7a8e6747 Python 2.x type hash: __hash__() hs.__hash__(' PROBLEM ')= 0xa638a41 hs.__hash__(' PROBLEN ')= 0xa638a42 hs.__hash__(' PROBLEO ')= 0xa638a43 hs.__hash__(' PROBLEP ')= 0xa638a5c 

Edit: The difference is explained for Python 3.2 "Hash values ​​now represent the values ​​of the new type Py_hash_t, etc."

Edit2 @Pih Thanks [link] http://svn.python.org/view/python/trunk/Objects/stringobject.c?view=markup

 static long 1263 string_hash(PyStringObject *a) 1264 { 1265 register Py_ssize_t len; 1266 register unsigned char *p; 1267 register long x; 1268 1269 if (a->ob_shash != -1) 1270 return a->ob_shash; 1271 len = Py_SIZE(a); 1272 p = (unsigned char *) a->ob_sval; 1273 x = *p << 7; 1274 while (--len >= 0) 1275 x = (1000003*x) ^ *p++; 1276 x ^= Py_SIZE(a); 1277 if (x == -1) 1278 x = -2; 1279 a->ob_shash = x; 1280 return x; 1281 } 
+4
source share
3 answers

The answer is why they are different, it says:

Hash values ​​are now the values ​​of the new type, Py_hash_t, which is defined as being the same size as the pointer. Previously, they were of the type long, which on some 64-bit operating systems still remains only 32 bits.

Hashing also considers the new values ​​to be computed; take a look at

  sys.hash_info 

For strings, you can take a look at http://svn.python.org/view/python/trunk/Objects/stringobject.c?view=markup string 1263 string_hash (PyStringObject * a)

+5
source

The 'Whats New?' Documentation A hint is available to answer your question. Take a look here !

+3
source

I looked at a new function in the source (in unicodeobject.c) and rebuilt it in Python. Here he is:

 def my_hash(string): x = ord(string[0]) << 7 for c in string: x = (1000003 * x) ^ ord(c) x ^= len(string) needCorrection = x & (1 << 65) x %= 2 ** 64 if needCorrection: x = -~(-x ^ 0xFFFFFFFFFFFFFFFF) if x == -1: x = -2 return x 

This is a 64-bit version only. Now adjusted for Python's weird behavior when numbers get negative. (You better not think too much about it.)

+1
source

All Articles