How is the __hash__ return value used?

Suppose I am writing a class, but do not define __hash__ for it. Then __hash__(self) defaults to id(self) ( self memory address), as per the documentation .

However, I do not see in the documentation how this value is used.
Therefore, if my __hash__ was just return 1 , which would cause the hash of all instances of my class to be the same, they are all transferred to the same base hash array (which I assume is implemented in C). However, this does not mean that the __hash__ return value __hash__ used as the key to the bin elements in this base hash table.
So, really, my question is: what happens to the value returned by __hash__ ? Is it used as a key directly or its hash (or the result of some other calculation performed on it) used as a key to the hash table?

In case that matters, I am on python2.7

EDIT . To clarify, I am not asking how hash collisions are handled. In python, this looks like a linear chain . Instead, I ask how the return value of __hash__ converted to the memory address (?) __hash__ corresponding bucket.

+3
source share
3 answers

Since Python hash tables have a size equal to two forces, the least significant bits of the hash value determine the location in the hash table (or at least the location of the original probe).

The sequence of probes in the size of table n is determined as follows:

 def gen_probes(hashvalue, n): 'Same sequence of probes used in the current dictionary design' mask = n - 1 PERTURB_SHIFT = 5 if hashvalue < 0: hashvalue = -hashvalue i = hashvalue & mask yield i perturb = hashvalue while True: i = (5 * i + perturb + 1) & 0xFFFFFFFFFFFFFFFF yield i & mask perturb >>= PERTURB_SHIFT 

For example, a dictionary:

 d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'} 

stored as an array of size 8 with each entry in the form (hash, key, value) :

 entries = [['--', '--', '--'], [-8522787127447073495, 'barry', 'green'], ['--', '--', '--'], ['--', '--', '--'], ['--', '--', '--'], [-9092791511155847987, 'timmy', 'red'], ['--', '--', '--'], [-6480567542315338377, 'guido', 'blue']] 

The C source code for entering keys in Python dictionaries can be found here: http://hg.python.org/cpython/file/cd87afe18ff8/Objects/dictobject.c#l550

+2
source

When an object is stored in a dictionary, __hash__ used to determine the source bin in which the object is placed. However, this does not mean that one object will be confused with another in the dictionary - they still check the equality of the object. It just means that the dictionary will be a bit slower in hashing this type of object than others.

+1
source

Of course, logically (in terms of code using a hash table) the object itself is the key. If you are looking for the key "foo" in the hash table, no matter what other objects in the hash table have the same hash value as "foo" , the corresponding value will be returned only if one of the pairs key-value stored in a hash table has a key equal to "foo" .

I donโ€™t know exactly what Python is, but a hash table implementation should consider hash collisions. If the hash table array has N slots, then if you enter N + 1 values โ€‹โ€‹(and the table will not be changed first), there should be a collision. In addition, as in the case where you mentioned where __hash__ always returns 1 or just as a quirk of implementing a hash function, it is possible to have two objects with exactly the same hash code.

There are two main strategies used to handle hash collisions in a hash table for a single machine in memory (different methods used for distributed hash tables, etc.):

  • Each slot in the array is a list (usually a linked list), and all the values โ€‹โ€‹that hash on k modulo N are placed on the list in slot k . Therefore, if hash values โ€‹โ€‹collide, this is not a problem because both objects with the same hash value fall into the same list.
  • Some sensing scheme. Basically, if the object you are inserting has a hash value equal to k modulo N , you are looking at slot k . If it is full, you will apply some formula to your current location (maybe just add 1) and see the next slot. You follow the regular pattern to select the next slot, given the original hash value and the number of probes so far, and continue to research until you find an open slot. This is less used, because if you are not careful in your implementation, you may encounter clustering problems, that is, you will have to check many times before detecting an object.

Wikipedia talks more about hash table implementations here .

0
source

All Articles