I have a very large range / set of numbers, (1..1236401668096) , which basically I would like to "shuffle", i.e. move randomly without re-viewing the same number. I will start the web service, and each time a request arrives, it will increment the counter and pull the next "shuffled" number out of range. The algorithm should adapt to the fact that the server goes offline, having the ability to restart the crawl using the saved counter value (something like how you can sow a pseudo-random number generator and get the same pseudo-random number taking into account seeds and iteration, where you are).
I am wondering if such an algorithm exists or exists. I saw the Fisher-Yates Shuffle , but the first step is to "Write down the numbers 1 to N," which will require a terabyte to store my entire assortment. Generating a pseudo-random number for each query may work for a while, but as the database / tree populates, collisions will become more common and may degrade performance (there is already a probability of 0.08% collision after 1 billion views according to my calculations). Is there a better solution for my scenario, or is it just a dream?
The reason for the shuffling is that the ability to correctly guess the next number in the sequence can lead to a minor DOS vulnerability in my application, but also because the presentation level will be much better with a wider distribution of numbers (I'd rather not go into details about what the application does). At the moment I'm considering using PRNG and handling collisions or shuffling range ranges (starting with (1..10000000).to_a.shuffle , then (10000001, 20000000).to_a.shuffle , etc., since each range number begins to end).
Do any mathematicians have any better ideas / suggestions?
source share