An iterator to create a unique random order?

The problem is formulated as follows: we have a very large number of elements that intersect through the iterator template (which dynamically creates or selects) the requested element.

Due to the large number of elements and, therefore, cannot be stored in memory (for example, as a list).

What is the procedure for an iterator to follow in order to create a random order of elements each time the iterator is called. Unique random order means that in the end, all objects intersect only once but returns in random order .

if the number of elements is relatively small, you can solve this problem as follows:

  • Store items in a list in memory (or in additional memory)
  • Shuffle list
  • Drag the shuffled list.

For this question, we can assume that an iterator can index elements (or rank / unblock them). Thus, the iterator can get the ith element for all indices i in the range of elements.

Note random order should be evenly distributed in the set of all orderings of positions or in other words be objective . This condition excludes solutions that randomize the list of elements in the block diagram (in order, for example, to have some elements in memory and randomize only them, then the next block of elements, etc.)

+2
iterator language-agnostic random-sample
Mar 11 '15 at 15:31
source share
2 answers

Encryption is reversible, so encryption is a one-to-one mapping from a set onto itself.

Choose a block cipher with a block size large enough to cover the number of elements you have.

Encrypt numbers 0, 1, 2, 3, 4, ... This will give you a non-repeating ordered list of numbers up to 2 ^ (block size).

If the encrypted number is too large, ignore it. If the encrypted number is within your list of items, select that item. Repeat for any number of items you need.

Cypher with a variable block size (e.g. Hyp Pypding cypher) will reduce the number of misses.

+3
Mar 11 '15 at 16:48
source share

I will present a plan to solve this problem as follows:

It's all about coding an index (successor) to create random order.

Since the mentioned (reversible) encryption is such a scheme.

Let's look at the general scheme of such schemes:

Suppose an iterator uses the following function to get the next index / element. For the normal case, the successor function will be:

def succ(index): return index+1 

Returns the next index. When viewed as a binary operation, the index+1 code creates a new index template (due to the lack of a better word) from the current index template.

Presumably, you can generalize this pattern (in binary operations, i.e. logN) to create a succesor function that returns the next index, but passed in random or random unique order.

For example, encryption procedures can be considered as such a scheme.

An even simpler scheme would be to use modular arithmetic, similar to a pseudo-random number generator, with suitable parameters ie:

 def succ_rand_modulo(index): return (seed*index+step) mod numItems 

For suitable seed options and step (for example, relatively simple and large enough) numbers to create random orderings with a homogeneous property (i.e. unbiased )

Linear mapping, as indicated above, may not produce all orderings; we can assume polynomial mappings with freer settings for settings that can perform all permutations / orderings (see Permutation polynomials (over finite fields) )

Effectively, all of these methods are methods of transcoding indexes (for example, encryption, modular arithmetic, pseudo-random generation, binary manipulation, etc.).

I would say that the most efficient way would be (polynomial) modulo (i.e. generating pseudorandom numbers), provided that the parameters are selected accordingly to be able to uniformly / unbiased distribution over all random orderings (see above).

UPDATE:

Another way that you want to store only numItems bits in memory (but it takes more time) is by the reject method. You can randomly generate or retrieve one index / element and set the corresponding bit in the bit vector to 1 (selected element), then produce another random index, and if the corresponding bit is set (the index is already in use), reject it and re-produce another random index, and so on. Further. On average, this will have O(numItems) , but the worst case may be long.

UPDATE2:

Another promising approach, as well as optimal in various cases, is the Lecture Notes on Lexicographic Generation method (also referring to random generation) .

Links to similar issues / similar approaches:

Unique random number generation schemes

Encryption / Encryption Schemes

Coding based schemes / number theory schemes

0
Mar 11 '15 at 20:21
source share



All Articles