If you really need proof that you are not getting collisions, it all comes down to the concatenation of all fields, and the separator is not contained in the fields. Of course, this will normally work very long and cumbersome.
What everyone usually does is: serve this line to the Hash function. Although theoretically this is not unique, given a suitable hash function with a sufficiently large result, it should be able to find one that is unlikely to cause collisions during the life of the human race. For example, git uses such a hash (sha1), and Linus Torvalds writes about an accidental collision :
First of all, let me remind people that an unintentional view of a collision is actually really really unlikely, so we are very likely to never see it in the entire history of the universe.
Another thing is not an accidental encounter. First of all, you need to make sure that the row you start with is not the same for different columns. It means:
- Make sure all columns are contained
- Make sure the columns are separated by something that is not contained in the columns themselves. Use shielding if necessary. For example, if you just concatenate two columns, the values ββ"abc" + "def" will give you the same result as "a" + "bcdef"
If you need to worry about targeted attacks, that is, someone is actually trying to create entries with the same hash, it is best to use a cryptographic hash, possibly used to hash passwords, which are often designed to be slow, to prevent brute force attacks. Of course, this may run into the requirement that most applications be as fast as possible.
source share