Converting an integer to a random but deterministically repeatable choice

How do I convert an unsigned integer (representing a user ID) to a random, but actually deterministically repeated choice? The choice should be chosen with equal probability (regardless of the distribution of input integers). For example, if I have 3 options, i.e. [0, 1, 2] , user identifier 123 can always be arbitrarily assigned by selection 2, while user identifier 234 can always be assigned selection 1.

Cross-language and cross-platform algorithmic reproducibility is desirable. I tend to use a hash function and a module if there is no better way. Here is what I have:

 >>> num_choices = 3 >>> id_num = 123 >>> int(hashlib.sha256(str(id_num).encode()).hexdigest(), 16) % num_choices 2 

I use the latest stable Python 3. Note that this question is similar, but not exactly identical to its related question, converting a string into a random but deterministically repeated uniform probability .

+7
python random
source share
4 answers

Using random

The random module can be used with id_num as its seed, and also addresses issues related to both thread safety and continuity. Using randrange in this way is comparable and simpler than hashing the seed and accepting modulo.

However, the problem associated with cross algorithmic reproduction is a problem.

 import random def id_to_choice(id_num, num_choices): localrandom = random.Random(id_num) choice = localrandom.randrange(num_choices) return choice >>> id_to_choice(123, 3) 0 >>> id_to_choice(456, 3) 2 

Using hash and modulo

 import hashlib def id_to_choice(id_num, num_choices): id_bytes = id_num.to_bytes((id_num.bit_length() + 7) // 8, 'big') id_hash = hashlib.sha512(id_bytes) id_hash_int = int.from_bytes(id_hash.digest(), 'big') # Uses explicit byteorder for system-agnostic reproducibility choice = id_hash_int % num_choices # For small num_choices return choice >>> id_to_choice(123, 3) 0 >>> id_to_choice(456, 3) 1 

Notes:

  • The built-in hash method should not be used, since it can store distribution input, for example. with hash(123) . Alternatively, it can return values ​​that are different when restarting Python, for example. with hash('123') .

  • To convert int to bytes, bytes(id_num) works, but is extremely inefficient, since it returns an array of null bytes and therefore should not be used. Using int.to_bytes better. Using str(id_num).encode() works, but spends a few bytes.

  • Admittedly, using modulo does not give an evenly uniform probability, [1] [2] , but this should not be shy for this application, because id_hash_int is id_hash_int be very large and num_choices is considered small.

+7
source share

An alternative is encryption of the user ID. If you keep the encryption key the same, then each input number will be encrypted to a different output number up to the size of the encryption block used. DES uses 64-bit blocks that span identifiers 000000 to 18446744073709551615. This will give a random replacement for the user ID, which is guaranteed to not give two different user IDs the same "random" number, since encryption is a one-to-one permutation of the block values.

0
source share

I apologize that I do not have a Python implementation, but I have a very clear, understandable, and self-evident implementation in Java that should be easily translated into Python with minimal effort. The following produce long predictable uniformly distributed sequences spanning the entire range except zero

XorShift ( http://www.arklyffe.com/main/2010/08/29/xorshift-pseudorandom-number-generator )

 public int nextQuickInt(int number) { number ^= number << 11; number ^= number >>> 7; number ^= number << 16; return number; } public short nextQuickShort(short number) { number ^= number << 11; number ^= number >>> 5; number ^= number << 3; return number; } public long nextQuickLong(long number) { number ^= number << 21; number ^= number >>> 35; number ^= number << 4; return number; } 

or XorShift128Plus (you must reseed state and state1 to non-zero values ​​before use, http://xoroshiro.di.unimi.it/xorshift128plus.c )

 public class XorShift128Plus { private long state0, state1; // One of these shouldn't be zero public long nextLong() { long state1 = this.state0; long state0 = this.state0 = this.state1; state1 ^= state1 << 23; return (this.state1 = state1 ^ state0 ^ (state1 >> 18) ^ (state0 >> 5)) + state0; } public void reseed(...) { this.state0 = ...; this.state1 = ...; } } 

or XorOshiro128Plus ( http://xoroshiro.di.unimi.it/ )

 public class XorOshiro128Plus { private long state0, state1; // One of these shouldn't be zero public long nextLong() { long state0 = this.state0; long state1 = this.state1; long result = state0 + state1; state1 ^= state0; this.state0 = Long.rotateLeft(state0, 55) ^ state1 ^ (state1 << 14); this.state1 = Long.rotateLeft(state1, 36); return result; } public void reseed() { } } 

or SplitMix64 ( http://xoroshiro.di.unimi.it/splitmix64.c )

 public class SplitMix64 { private long state; public long nextLong() { long result = (state += 0x9E3779B97F4A7C15L); result = (result ^ (result >> 30)) * 0xBF58476D1CE4E5B9L; result = (result ^ (result >> 27)) * 0x94D049BB133111EBL; return result ^ (result >> 31); } public void reseed() { this.state = ...; } } 

or XorShift1024Mult ( http://xoroshiro.di.unimi.it/xorshift1024star.c ) or Pcg64_32 ( http://www.pcg-random.org/ , http://www.pcg-random.org/download.html )

0
source share

The easiest way is modulo user_id by the number of parameters:

 choice = user_id % number_of_options 

It is very easy and fast. However, if you know user_id, you can guess the algorithm.

In addition, pseudo-random sequences can be obtained from random , seeded with user constants (for example, user_id ):

 >>> import random >>> def generate_random_value(user_id): ... random.seed(user_id) ... return random.randint(1, 10000) ... >>> [generate_random_value(x) for x in range(20)] [6312, 2202, 927, 3899, 3868, 4186, 9402, 5306, 3715, 7586, 9362, 7412, 7776, 4244, 1751, 3424, 5924, 8553, 2970, 709] >>> [generate_random_value(x) for x in range(20)] [6312, 2202, 927, 3899, 3868, 4186, 9402, 5306, 3715, 7586, 9362, 7412, 7776, 4244, 1751, 3424, 5924, 8553, 2970, 709] >>> 
-one
source share

All Articles