Python __hash__ for equal value objects

Let's say I have some Person objects, and I want to know if there are any in the list:

person in people? 

I don’t care what the identifier of an object is, only that their properties are the same. So I put this in my base class:

 # value comparison only def __eq__(self, other): return (isinstance(other, self.__class__) and self.__dict__ == other.__dict__) def __ne__(self, other): return not self.__eq__(other) 

But in order to be able to check equality in sets, I also need to define a So hash ...

 # sets use __hash__ for equality comparison def __hash__(self): return ( self.PersonID, self.FirstName, self.LastName, self.etc_etc... ).__hash__() 

The problem is that I do not want to list each property, and I do not want to change the hash function every time I change properties.

So good to do it?

 # sets use __hash__ for equality comparison def __hash__(self): values = tuple(self.__dict__.values()) return hash(values) 

Is it normal, not excessive? In a situation with a web application.

Thanks.

+7
python set hash
source share
3 answers

The disordered nature of the dictionaries means that tuple(self.__dict__.values()) tends to produce different results if the dict happens to be ordered differently (which could happen, for example, if it had attributes assigned in a different order).

Since your values hashed, you can try this instead:

 return hash(frozenset(self.__dict__.iteritems())) 

Alternatively, note that __hash__ does not need to be taken into account all, because __eq__ will still be used to check for equality when hash values ​​are compared equal. This way you can get away with

 return hash(self.PersonID) 

Assuming PersonID relatively unique across all instances.

+4
source share

If you are already using __dict__ equality for __eq__ , it would be foolish not to use __dict__ for __hash__ . However, values gives an arbitrarily ordered list that does not contain information about which value corresponds to the attribute, so the code does not actually work. Instead, you can try

 return hash(tuple(sorted(self.__dict__.viewitems()))) 

or

 return hash(frozenset(self.__dict__.viewitems())) 

both of these will fix order issues and store attribute name information.

+1
source share

Thanks for the nice question. You did exactly what I wanted to do. After reading these answers, I did something similar, but with a few differences.

 def __str__(self): return "{}({})".format(type(self).__name__, ", ".join(["{}={}".format(k, self.__dict__[k]) for k in sorted(self.__dict__)])) def __eq__(self, other): return isinstance(other, type(self)) and self.__dict__ == other.__dict__ def __ne__(self, other): return not self == other def __hash__(self): return hash(tuple(self.__dict__[k] for k in sorted(self.__dict__))) 

I turned on the string method for extra credit as I went and changed it after I thought out the hash method.

I found in another answer that self.__eq__ should not be called directly, so instead I used == .

This hash uses a tuple of attribute values ​​for the class, sorted by key. This ensures consistency of ordering in the tuple. If you sorted the values, instead the case was replaced with two attributes, they would have the same hash.

0
source share

All Articles