Generate random permutation of a huge list (in Python)

I would like to create a random permutation of numbers [1,2,...,N] , where N is a large number. Therefore, I do not want to store all the elements of the permutation in memory, but rather to iterate over the elements of my specific permutation, without preserving the old values โ€‹โ€‹in memory.

Any idea how to do this in Python?

+5
source share
2 answers

One possibility is to use encryption. Since encryption is reversible, that is, individually, for this key you will receive the same numbers that you encrypt, but in a different order.

You need a block cipher with a block size sufficient to include your maximum N. Use DES in ECB mode for N = 2 ^ 64 - 1. Use AES in ECB mode for N = 2 ^ 128 - 1. For other sizes, or use Hasty Pudding cipher , which has a variable block size, or write your own simple Feistel cipher . I assume that you just need to shuffle, not a cryptographically secure shuffle.

If the result is greater than N, then simply re-encrypt it until it is less than N, the 1-to-1 property ensures that the chain of large numbers is also unique.

There is no need to store the entire array in memory, each number can be encrypted as needed. Only a key and an encryption algorithm are required. One minor complication is that block ciphers work on [0 ... N-1]; You may need additional code to eliminate extremes.

+5
source

This is a common problem, not a Python specific one. In most languages, even when iterators are used to use structures, the entire structure is stored in memory. Thus, iterators are mainly used as โ€œfunctionalโ€ tools, and not as memory optimization tools.

In python, many people use large memory due to the presence of really large structures (dictionaries, etc.). However, all program object variables will be stored in memory in any way. The only solution is to serialize the data (saving to the file system, database, etc.).

So, in your case, you can create a custom function that will create a list of permutations. But instead of adding each permutation element to the list, he saved the element either in a file (or in a database with the corresponding structure). Then you can get one at a time each permutation from a file (or database), without bringing the entire list to memory.

However, as mentioned earlier, you always need to know which permutation you are currently in. To avoid extracting all the created permutations from the database (which would create the same bottleneck), you could have an index for each location containing the character used in the previously generated permutation (and create permutations that add characters and a predefined sequence).

0
source

All Articles