Convert string to random but deterministically repeatable uniform probability

How to convert a string, for example. user identifier plus salt, to a random but actually deterministically repeatable uniform probability in the half-open range [0.0, 1.0]? This means that the output is β‰₯ 0.0 and <1.0. The output distribution should be uniform regardless of the input distribution. For example, if the input string is "a3b2Foobar", the probability of output may be 0.40341504.

Cross-language and cross-platform algorithmic reproducibility is desirable. I tend to use a hash function if there is no better way. Here is what I have:

>>> in_str = 'a3b2Foobar' >>> (int(hashlib.sha256(in_str.encode()).hexdigest(), 16) % 1e8) / 1e8 0.40341504 

I am using the latest stable Python 3. Note that this question is similar, but not exactly identical to its related question, convert an integer to a random but deterministically repeated choice .

+7
python random
source share
1 answer

Using random

The random module can be used with in_str as its seed, and when solving problems related to both thread safety and continuity.

However, the problem associated with cross algorithmic reproduction is a problem.

 import random def str_to_probability(in_str): """Return a reproducible uniformly random float in the interval [0, 1) for the given seed.""" return random.Random(in_str).random() >>> str_to_probability('a3b2Foobar') 0.4662507245848473 

Use hash

The cryptographic hash is supposedly uniformly distributed by an integer in the range [0, MAX_HASH]. Accordingly, it can be scaled to a floating point number in the range [0, 1) by dividing it by MAX_HASH + 1.

 import hashlib Hash = hashlib.sha512 MAX_HASH_PLUS_ONE = 2**(Hash().digest_size * 8) def str_to_probability(in_str): """Return a reproducible uniformly random float in the interval [0, 1) for the given string.""" seed = in_str.encode() hash_digest = Hash(seed).digest() hash_int = int.from_bytes(hash_digest, 'big') # Uses explicit byteorder for system-agnostic reproducibility return hash_int / MAX_HASH_PLUS_ONE # Float division >>> str_to_probability('a3b2Foobar') 0.3659629991207491 

Notes:

  • The built-in hash method should not be used, since it can store distribution input, for example. with hash(123) . Alternatively, it can return values ​​that are different when restarting Python, for example. with hash('123') .
  • Using modulo is not required.
+12
source share

All Articles