Good hash function for a pair of primitive types

I am trying to figure out a good hash function for std :: a pair of two primitive types. That is how I implemented it right now:

template<typename T, typename U> std::size_t operator()(const std::pair<T,U> &rhs) const { return stdext::hash_value<T>(rhs.first) ^ stdext::hash_value<U>(rhs.second); } 

It works even if I have two pairs, such as (1, 2) and (2, 1) (the numbers are upside down). They generate the same hash value, but the values ​​are still successfully placed in the hash map. Any thoughts?

+7
source share
5 answers

Generally speaking, hash containers should always handle this case (hash collisions). There are several methods that they can use, such as chaining and sensing, any of which can hurt performance.

Instead, I would suggest using boost::hash_combine to combine hashes in such a way that they change first and second do not generate the same hash.

+5
source

Here is the implementation of hash_combine based on documents from the current version of boost: ( http://www.boost.org/doc/libs/1_53_0/doc/html/hash/reference.html#boost.hash_combine )

 template<typename T> void hash_combine(size_t & seed, T const& v) { seed ^= stdext::hash_value(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2); } 

You would use it as follows:

 template<typename T, typename U> std::size_t operator()(const std::pair<T,U> &rhs) const { size_t retval = stdext::hash_value<T>(rhs.first); hash_combine(retval, rhs.second); return retval; } 

I can't really vouch for the logic of this function, but it looks more robust than most of the options mentioned here.

+2
source

Assuming stdext :: hash_value provides a good hash distribution for each of the first and second, what you did is fine if you do not expect a disproportionate number of pairs of mirror images (e.g., (1,2) and (2, one)). If your data set is such that you expect a lot, you might consider setting up a hash function to make them different. For example, inverting the first hash value:

 return ~stdext::hash_value<T>(rhs.first) ^ stdext::hash_value<T>(rhs.second); 

I mention this only because you expressed concern about mirror image pairs. If your input is close to random in this regard, then ^ should be fine. Remember that the goal is to minimize collisions, and not to avoid them completely.

+1
source

As others have noted, hash functions only affect the performance of hash tables, not validity. Your function is bad if the mirror image pairs are frequent. Since this may be a problem in some applications, it would be wise to swap the upper and lower half words of the same hash value, and then xor. Many other schemes are possible, but it is very fast and simple.

 template<typename T, typename U> std::size_t operator()(const std::pair<T,U> &rhs) const { const int h = sizeof(size_t) * 8 / 2; size_t a = stdext::hash_value<T>(rhs.first); return ((a << h) | (a >> h)) ^ stdext::hash_value<U>(rhs.second); } 
0
source

Just for fun, here is another approach that is simple and directly solves problems, in particular:

  • If both first and second match, it returns the total hash value
  • otherwise, XOR assigns values, but:
    • to prevent h (a, b) colliding with h (b, a), he uses <b to choose h (a) ^ h (b) and h (a) ^ h (~ b)

Implementation:

 1 template<typename T, typename U> 2 std::size_t operator()(const std::pair<T,U> &rhs) const 3 { 4 std::size_t a = stdext::hash_value<T>(rhs.first); 5 return rhs.first == rhs.second ? a : 6 a ^ (rhs.first < rhs.second ? stdext::hash_value<U>(rhs.second) 7 : stdext::hash_value<U>(~rhs.second)); 8 } 

Seriously, though, I would recommend following the recommendations of Mark B and using boost :: hash_combine - less branching is likely to improve speed.

0
source

All Articles