Goal analysis and choosing a good hash function

This is not a specific issue with a specific solution; but this is more of an answer to the fact that I cannot find a good question about stack overflow on how to choose a good hash function for hash tables and similar tasks.

So! Lets say hash functions and how to choose one. How should you program noob, which needs to choose a good hash function for its specific task, choose one? When do you need a simple and fast Fowler-Noll-Vo? When should they sell at MurmurHash3 instead? Do you have links to good resources when comparing different options?

+7
source share
2 answers

A hash function for hash tables must have these two properties

  • Homogeneity All outputs H () should be evenly distributed as much as possible. In other words, for a 32-bit hash function, the probability of each output should be 1/2 ^ 32. (for n-bits, this should be 1/2 ^ n). With a uniform hash function, the probability of collision is minimized to a minimum for any possible input.
  • Low computational costs It is assumed that the hash functions for the tables will be FAST compared to cryptographic hash functions, where the rate is traded for resistance to the inverse image (for example, it is difficult to find a message from a given value of the hash function) and collision resistance.

For hash tables, all cryptographic functions have a BAD choice, since the computational cost is huge. Since hashing is used here not for security, but for quick access. MurmurHash is considered one of the fastest and most uniform features suitable for large hash tables or hash indexes. For small tables, the trivial hash function should be fine. A trivial hash is where we mix the values โ€‹โ€‹of an object (by multiplying, adding and subtracting with some simple).

+4
source

If your hash keys are strings (or other data of variable length), you can see this article by Ramakrishna and Zobel. They compare several classes of hashing functions (for speed and low collisions) and show a class that is better than regular Bernstein hashes.

+1
source

All Articles