Good hash functions are one-way functions that allow you to create a distributed value from any given input. This way you will get several unique values ββfor each input value. They are also repeatable, so any input will always generate the same output.
An example of a good hash function is SHA1 or SHA256.
Say you have a user database table. The columns are id , last_name , first_name , telephone_number and address .
Although any of these columns may have duplicates, suppose no row is the same.
In this case, id is simply the unique primary key of our creation (surrogate key). The id field does not actually contain any user data, because we could not find a natural key that was unique to users, but we use the id field to build foreign key relationships with other tables.
We could find such an entry in our database:
SELECT * FROM users WHERE last_name = 'Adams' AND first_name = 'Marcus' AND address = '1234 Main St' AND telephone_number = '555-1212';
We need to search 4 different columns using 4 different indexes to find my record.
However, you can create a new hash column and store the hash value of all four columns together.
String myHash = myHashFunction("Marcus" + "Adams" + "1234 Main St" + "555-1212");
You can get a hash value like AE32ABC31234CAD984EA8 .
You save this hash value as a column in the database and an index on it. Now you need to search only one index.
SELECT * FROM users WHERE hash_value = 'AE32ABC31234CAD984EA8';
As soon as we have the identifier of the requested user, we can use this value to search for related data in other tables.
The idea is that the hash function is unloaded from the database server.
Collisions are unlikely. If two users have the same hash, most likely they have duplicate data.