Math question regarding Python uuid4

I am not very good at statistical math, etc. I was wondering if I use the following:

import uuid unique_str = str(uuid.uuid4()) double_str = ''.join([str(uuid.uuid4()), str(uuid.uuid4())]) 

Is double_str square of the string as unique as unique_str , or just just a little unique? In addition, are there any negative consequences for doing something like this (for example, a happy birthday situation, etc.)? This may seem ignorant, but I just don't know how my math extends algebra 2 at best.

+6
math uuid random unique unique-key
source share
3 answers

The uuid4 function returns a UUID created from 16 random bytes, and is extremely unlikely to lead to a collision, to such an extent that you probably shouldn't even worry about that.

If for any reason uuid4 creates a duplicate, it is much more likely that there will be a programming error, such as a failure to correctly initialize the random number generator, than a genuine failure. In this case, the approach you use will not improve the situation - an incorrectly initialized random number generator can create duplicates even with your approach.

If you use the default implementation random.seed(None) , you can see in the source that only a 16 byte random generator is used to initialize the random number, so this is a problem that you will have to solve first. In addition, if the OS does not provide a source of randomness, system time will be used, which is not very random at all.

But ignoring these practical problems, you are basically right. To use a mathematical approach, we first need to determine what you mean by "uniqueness." I think a reasonable definition is the number of identifiers that need to be created before the probability of creating a duplicate exceeds some probability p . Suitable formula for this:

alt text

where d is 2**(16*8) for one randomly created uuid and 2**(16*2*8) with your proposed approach. The square root in the formula is really due to Paradoxical Birthday . But if you do this, you will see that if you change the range of values โ€‹โ€‹of d while keeping the constant p , then you can also square n .

+18
source share

Since uuid4 is based on a pseudo-random number generator, calling it twice will not mean "uniqueness" (and cannot even add any uniqueness).

See also When should uuid.uuid1 () be used against uuid.uuid4 () in python?

+1
source share

It depends on the random number generator, but it's almost quadratic uniqueness.

-one
source share

All Articles