How to create a predictable sequence shuffle without first creating the entire sequence?

Question

How to create a predictable sequence shuffle without first creating the entire sequence?

The following python code describes exactly what I want to achieve for a sequence of arbitrary size (collection):

import random fixed_seed = 1 #generate the same sequence every time with a fixed seed population = 1000 sample_count = 5 #demonstration number num_retries = 3 #just enough to show the repeatable behaviour for trynum in xrange(num_retries): #generate the fresh/ordered sequence (0->population)... seq = range(population) #seed the random number generator the same way every time... random.seed(fixed_seed) #shuffle the sequence... random.shuffle(seq) #display results for this try... sample_sequence = [str(x) for x in seq[:sample_count]] print "try %s: %s..." % (trynum + 1, ", ".join(sample_sequence)) #Sample output... #try 1: 995, 721, 62, 326, 541... #try 2: 995, 721, 62, 326, 541... #try 3: 995, 721, 62, 326, 541...

The problem with this method is that it requires generating everything first in memory. This can be a problem for large populations.

Note that the potentially big advantage of this method is that you can select any position in the array at any time.

Now - if the problem at hand allows you to set the population size to a power of two (minus 1), a linear feedback shift register can be used to obtain a predictable random sequence. LFSRs are neat and well described in the wikipedia article .

The following is a sample Python code (and I did a bunch of uniqueness tests to make sure that it works as advertised). See the wikipedia article for an explanation of how the code works ( Galois Configuration ).

 TAP_MASKS = { #only one needed, but I included 3 to make the code more useful 10: 0x00000240, #taps at 10, 7 16: 0x0000B400, #taps at 16, 14, 13, 11 32: 0xE0000200, #taps at 32, 31, 30, 10 } def MaxLengthLFSR(seed, register_length): "Gets next value from seed in max-length LFSR using Galois configuration." lsb = seed & 1 next_val = seed >> 1 if lsb == 1: mask = TAP_MASKS[register_length] next_val ^= mask return next_val reglen = 16 #number of bits in register population = (2**reglen) - 1 #not used, just showing it fixed_seed = 1 #seed == startval in this case (could randomize in population) sample_count = 5 #demonstration number num_retries = 3 #just enough to show the repeatable behaviour for trynum in xrange(num_retries): next_val = fixed_seed seq = [fixed_seed, ] for x in xrange(sample_count - 1): next_val = MaxLengthLFSR(next_val, reglen) seq.append(next_val) seq = [str(x) for x in seq] print "try %s: %s..." % (trynum + 1, ", ".join(seq)) #Sample output... #try 1: 1, 46080, 23040, 11520, 5760... #try 2: 1, 46080, 23040, 11520, 5760... #try 3: 1, 46080, 23040, 11520, 5760...

This is good because you can have a HUGE population and easily calculate a repeatable sequence of repeating random numbers without using a large chunk of memory.

Disadvantages: a) that it is limited to “shuffling” size sequences (2 ** N - 1) and b) that you cannot determine what value of a particular position in a random sequence is located at an arbitrary location. You need to know the meaning at a certain point and walk from there.

The latter (b) is mostly approved, as most of the time you will generate the sequence in order, so you just need to remember the last value. The strength of constraint 2 (a) is the kind of deal killer, though ... depending on the application.

How do you achieve non-duplicate results with a maximum LFSR length for an arbitrary sequence length?

As a bonus, it would be nice to have a solution in which you can find out the number in the given position of the sequence without missing the sequence in this position.

Note: if you need a good starting set of LFSR crane locations for LFSRs of maximum length (those that generate the entire set of registers without repeating once), this link is pretty good and has a huge number of tap locations per register size (up to 32 bits, anyway )

Also note that I saw many questions closely related to my question and shuffling / LFSR, but none of them relate to what I need (predictable movement of an arbitrary linear sequence). Or at least as far as I could understand them, anyway.

I recently studied linear congruent generators that seem promising, but I have not been able to get them to work so far. Instead of sitting on the question further, I will ask him and send an answer if I find out and they work.

+4

python algorithm random

Russ Oct 11 '10 at 21:29

source share

2 answers

First of all, note that this is not a random sequence. It generates only one fixed repeating sequence, and the seed selects where in the initial sequence. Same as all PRNGs, of course, but usually the PRNG cycle is much larger than 16-bit or 32-bit. As you described it, the length of the loop is equal to the number of elements you repeat, so all you have to do is take one “shuffled” order and change it when you start. A seed value is more like an initial index than a seed.

This is not the most satisfactory answer, but it is probably practical: you can impose a length on the next power of two and skip any indexes above the actual maximum. Thus, if you have 5000 elements, generate a sequence of 8192 elements and discard any results between [5000.8191]. The overhead from this sounds ugly, but in the long run it is not so bad: since it can double the length of the list, on average you have to give up one of two results, so the average cost in the worst case doubles the amount of work.

The following code demonstrates this (and also shows a cleaner way to implement it). The third parameter, MaxLengthLFSR, if specified, is the actual maximum value. You will probably want to populate TAP_MASKS for more sizes, and then choose the smallest register size that matches the required sequence length; here we just use the requested one, which works, but will cause much more overhead if the length of the sequence is much longer than it should be.

 TAP_MASKS = { # only one needed, but I included 3 to make the code more useful 10: 0x00000240, # taps at 10, 7 16: 0x0000B400, # taps at 16, 14, 13, 11 32: 0xE0000200, # taps at 32, 31, 30, 10 } def MaxLengthLFSR(next_val, reglen, max_value=None): """Iterate values from seed in max-length LFSR using Galois configuration.""" # Ensure that max_value isn't 0, or we'll infinitely loop without yielding any values. if max_value is not None: assert max_value > 0 while True: if max_value is None or next_val < max_value: yield next_val lsb = next_val & 1 next_val = next_val >> 1 if lsb == 1: mask = TAP_MASKS[reglen] next_val ^= mask sample_count = 5 # demonstration number num_retries = 3 # just enough to show the repeatable behaviour for trynum in xrange(num_retries): it = MaxLengthLFSR(1, 16, 2000) seq = [] for x in xrange(sample_count): seq.append(next(it)) seq = [str(x) for x in seq] print "try %s: %s..." % (trynum + 1, ", ".join(seq))

+1

Glenn maynard Oct 11 '10 at 22:28

source share

Nick johnson · Accepted Answer · 2010-10-11T22:23:02+0000

I already wrote about this before: Protected permutations with block ciphers . In a nutshell:

Yes, you can use LFSR to generate permutations with a length equal to a power of 2. You can also use any block cipher. With block encryption, you can also find an element with index n or an index for element n.
To create a permutation with an arbitrary length l, create one with the least cardinality 2 of length greater than l. If you want to find the element of the nth permutation, apply the permutation function, and if it generates a number outside the required range, apply it again; repeat until the number is in the valid range.

The number of iterations required for step 2 will be no more than 2; the worst case is high, but extremely unlikely.

How to create a predictable sequence shuffle without first creating the entire sequence?

More articles: