Are there situations where a hash algorithm can be guaranteed to be unique?

If I have similar data with a size limit (e.g. social security number) using a hash algorithm with a larger byte size than data (e.g. sha-256), does the hash guarantee the same level of uniqueness as the original data?

+7
unique hash hash-code-uniqueness sha256
source share
5 answers

You can always create a custom hash that guarantees uniqueness. For data in a known domain (for example, SSN), the exercise is relatively simple.

If your target hash value actually has more bits available than what you haveh, the hash simply maps the input values ​​to one of the available output values. This will be a simple linear mapping from the input value as a multibyte integer to output as a multibyte integer.

If the target hash value has fewer bits than what hashed, then uniqueness cannot be guaranteed.

+5
source share

The probability of a hash collision has nothing to do with the size of the input string (except that it indicates how much input you need to preserve uniqueness). It is possible to have a hash collision when you hash 0 and 1 using the perfect hash algorithm, although the probability is 1 / (2 bits of length). Which in the case of SHA-256 is actually zero.

Hash conflicts are a birthday paradox. In the case of a 256-bit hash, the probability of a collision between two inputs depends only on the number of inputs and is:

  • 1 - (2 ^ 256)! / ((2 ^ 256 ^ inputcount) * (2 ^ 256-inputcount)!) Or, as others have said, basically zero for a reasonable number of inputs.
+4
source share

Others indicated that clashes should not be a concern; that is, the whole point of cryptographically secure hash functions. I would like to add the following:

  • If your input set is small enough (for example, SSN data - there are less than a billion of them), then the absence of collisions can be verified: just check it exhaustively.
  • If the input set is too large to be completely scanned, it is expected that no collision will be proven. Good hash functions are expected to act like random oracles, and on a random oracle you cannot prove such a property without trying in an exhaustive way. Being able to prove the absence of a collision would look suspiciously like a weak function.
+2
source share

If you use a cryptographic hash like SHA, then the short answer is yes.

+1
source share

One key feature of the nofollow noreferrer "> cryptographically secure hash function is that you can avoid collisions beyond reasonable doubt, regardless of the input. This is also true for input shorter than the output size, which is the same longer message with little entropy So you can use SHA-2 without worrying about collisions.

+1
source share

All Articles