Implementing Shuffling on a Sky Jukebox

Question

Implementing Shuffling on a Sky Jukebox

How to implement shuffle for "Celestial Jukebox"?

More precisely, at each moment of time t return a uniform random number between 0..n (t), so that in the whole sequence there are no repetitions, and n () increases with time.

For a specific example, suppose a fixed-speed music service allows you to play any song in a directory using an index number based on 0. Each new song is added so often that increase the range of index numbers. The goal is to play a new song each time (in the absence of duplicates in the catalog).

the perfect solution would be feasible on existing hardware - how would I run a list of six million songs in 8 MB of DRAM? Likewise, a high number of songs exacerbate O (n) selection timings.

- For an LCG generator, given the partially exhausted LCG at 0..N ₀ , it can be transferred to another LCG at 0..N ₁ (where N1> N0), which do not repeat the exhausted sequence.
- Checking that a certain song has already been played seems to be growing out of control quickly, although this may be the only way? Is there an effective data structure for this?

+4

language-agnostic math algorithm random puzzle

caffiend May 13, '09 at 19:59

source share

5 answers

With a limit of 8 MB for 6 million songs, there is simply no place to store even one 32-bit integer for each song. If you are not ready to store the list on disk (in this case, see below).

If you are ready to abandon the requirement to immediately add new elements to a random move, you can create an LCG on top of the current set of songs, and then, when this is exhausted, create a new LCG only for songs that have been since you started. Rinse and repeat until you have new songs. You can also use this rather cool algorithm , which generates an unidentified permutation in an arbitrary range without saving it.

If you are ready to abandon the requirement of 8 MB of RAM for 6 million songs or go to disk (for example, on a memory card), you can generate a sequence with 1..n at the beginning, shuffle it with fishermen-yates, and each time when a new song is added, select a random item from a section that is not so far played, insert a new identifier there and add the original identifier to the end of the list.

If you don’t really care about the efficiency of calculations, you can save a bitmap image of all the songs and repeatedly select identifiers evenly randomly until you find the one that you have not played yet. It will take 6 million attempts to find the last song (on average), which still damn fast works on a modern processor.

+1

Nick johnson May 13, '09 at 23:08

source share

While Erich's solution is probably better for your specific use case, checking that the song has already played very fast (amortized O (1)) with a hash structure like set in Python or hashset<int> in C ++.

0

Alex martelli May 13, '09 at 20:12

source share

You can simply generate a sequence of numbers from 1 to n, and then shuffle it using Shuffle Fisher-Yates. This way you can guarantee that the sequence will not be repeated, regardless of n.

0

Joey May 13, '09 at 20:23

source share

You can use a linked list inside an array: To create an original playlist, use an array containing the following:

  struct playlistNode{ songLocator* song; playlistNode *next; }; struct playlistNode arr[N];

Also keep the pointer "head" and "freelist";

Fill it in 2 passes:
1. fill arr with all the songs in the directory in the order 0..N.
2. randomly iterate over all indexes, filling in the next pointer;

Deleting Playable Songs - O (1):

 head=cur->next; cur->song=NULL; freelist->next = freelist; cur->next=freelist; freelist=cur;

Inserting new songs is O (1) as well: randomly select the index of the array and fix the new node.

 node = freelist; freelist=freelist->next; do { i=rand(N); } while (!arr[i].song); //make sure you didn't hit a played node node->next = arr[i].next; arr[i].next=node;

0

Ashelly May 13, '09 at 21:10

source share

Erich mirabal · Accepted Answer · 2009-05-13T20:05:20+0000

The way I like to make such random random choices is to have a list, and every time I select a random case between [0-N) , I remove it from this list. In your case, when new items are added to the catalog, it will also be added to the list of not yet selected. Once you get to the end, just reload all the songs back to the list.

EDIT:

If you take into account sentence v3, this can be done mainly O(1) after stage O(N) . This ensures non-repetitive random selection.

Here is a brief description:

Add starting items to list
Choose index i in random order (from the set [0,N) )
Delete item at index i
Replace the hole in i with the element Nth (or null if i == Nth ) and decrease N
For new items, just add at the end of the list and increment N as needed
If you ever play with all the songs (which I doubt if you have 6M songs), add all the songs to the list, pour, rinse and repeat.

Since you are trying to deal with fairly large sets, I would recommend using a DB. A simple table with basically two fields: id and " pointer " (where " pointer " is what tells you the song to play, which may be a GUID, FileName, etc., Depending on how you want it to do). Have an index on id and you should get very decent performance while maintaining between app launches.

EDIT for 8 MB limit:

Umm, this complicates things a bit ... In 8 MB you can store a maximum of ~ 2 M records using 32-bit keys.

So, I would recommend pre-selecting the following 2M entries. If the user plays through 2M songs for life, hell! To pre-select them, perform the pre-initialization step using the above algorithm. The only change I would make is that by adding new songs, roll the bones and see if you want to accidentally add this song to the mix. If so, select a random index and replace it with the new song index.

Implementing Shuffling on a Sky Jukebox

More articles: