MD5 hash collisions.

If the count is from 1 to X, where X is the first number having a md5 collision with the previous number, what number is X?

I want to know if I use md5 for serial numbers, how many units can I expect to be able to list before I get a collision.

+8
collision hash-collision hash md5
source share
6 answers

Theoretically, you can expect a collision for X around 2 64 . For a hash function with an output of n bits, the first collisions appear when you have accumulated about 2 n / 2 outputs (no matter how you select the inputs, sequential integer values ​​are nothing special respect for this).

Of course, MD5 showed that it is not a good hash function. In addition, 2 n / 2 is only average. So why not give it a try? Take the MD5 implementation, hash your serial numbers and see if you have a collision. The main MD5 implementation should have a hash of several million values ​​per second and with a reasonable hard drive, you could accumulate several billion outputs, sort them and see if there is a collision.

+5
source share

I cannot answer your question, but what you are looking for is uuid . UUID serial numbers may be unique across millions of products, but you may need to check your database to reduce the tiny chance of a collision.

+2
source share

I believe that no one has tried on this

Given that if you have a simple incremental number, you do not need to hash it

+1
source share

As far as I know, there are no known collisions in md5 for 2 ^ 32 (integer size)

+1
source share

It depends on the size of your input. An ideal hash function has collisions each (hash input_1 / hash length). If your entrance is a small collision, it is unlikely that until now there has been only one single-block collision.

0
source share

I understand that this is an old question, but I came across it, found a much better approach and decided that I would share it.

You have an upper bound for your sequence number N, so let's take advantage of this. Let them say that N <2 32 β‰ˆ 4.3 * 10 10 . Now every time you need a new identifier, you simply select a random 32-bit number R and combine it with R xor N (zero-pad before concatenation). This gives a unique unique 64-bit identifier, which you can designate with only 16 hexadecimal digits.

This approach completely prevents collisions, because two identifiers that have the same random component necessarily have different xor-ed components.

Bonus function: you can split such a 64-bit identifier into two 32-bit numbers and xor them with each other to restore the original serial number.

0
source share

All Articles