Are Collections.shuffle () random enough? Practical examples seem to deny this claim.

Question

Are Collections.shuffle () random enough? Practical examples seem to deny this claim.

I have 1000 unique objects in java.util.List , each of which refers to an image, each image in the 1000 list is unique, and now I would like to shuffle them so that I can use the first 20 objects and present them to the website user . Then the user can click the "Shuffle" button, and again I get 1000 images from scratch and call shuffle() again. However, it seems that out of 1000 image objects, I often see the same image again and again between 20-shaped selections.

Something seems wrong, any best suggestion, tips?

My code is very simple:

 List<String> imagePaths = get1000Images(); Collections.shuffle(imagePaths); int i = 0; for (String path: imagePaths) { ... do something with the path ... i++; if (i >= 20) break; }

I know that Collections.shuffle() well distributed: see, for example, http://blog.ryanrampersad.com/2012/03/03/more-on-shuffling-an-array-correctly/

However, I just get the feeling that the probability of seeing the same image again and again in a set of 20 images out of 1000 should be much less ...

Inputs are highly appreciated.

+7

java collections arrays random shuffle

basZero Mar 14 '12 at 12:06

source share

6 answers

Its human nature to see patterns that are not there. Many people see paintings in planets and stars, directing their lives.

The first 1000 PI digits have six nines in a row. Does this mean that PI numbers are not random? not. The pattern is not repeated more than you might expect.

Having said that Random is not completely random, and it will be repeated after calls 2 ^ 48. (it uses 48-bit seed). This means that it is not possible to use all possible long or double using it. If you want more randomness, you can use SecureRandom instead of shuffling.

It looks like you want something like this

 List<String> imagePaths = new ArrayList<>(); // called repeatedly if (imagePaths.size() <= 500) { imagePaths = get1000Images(); Collections.shuffle(imagePaths); } for (String path: imagePaths.subList(0, 20)) { ... do something with the path ... } imagePaths = imagePaths.subList(20, imagePaths.size());

This ensures that you do not see the same image in the last 500 calls.

+28

Peter Lawrey Mar 14 '12 at 12:39

source share

Your intuition is correct for a particular image [you are unlikely to see a particular image again and again], but not for a general image [you are likely to see some image repeating]. This is one of these places in the likelihood that our automatic intuition is wrong ...

This reminds me of a birthday paradox , which contradicts intuition and says - for a group of 23 people, the probability that 2 of them have the same birthday is 0.5, much more than intuition expects!

+5

amit Mar 14 '12 at 12:10

source share

I shuffled 52 times four times and each time iterated over the same card in the same slot, which gave me about 14 of the 208 cards, which was about 93.3% random.

+1

Nicholas Oct 24 '13 at 15:38

source share

Following your question, I wrote the following program. I created a list of consecutive integers and shuffled it 10, 100, 1000, and 10,000 times. After each series of tasses, I checked the value of the element in the 5th position of the array and created an array of counters: how many times each number appears in the 5th position.

Here is the program:

 public class MyTest { public static void main(String[] args) { int n = 10; List<Integer> list = new ArrayList<Integer>(); for (int i = 0; i < n; i++) { list.add(i); } int[] counters = new int[n]; for(int shuffles : new int[] {10, 100, 1000, 10000}) { Arrays.fill(counters, 0); for (int i = 0; i < shuffles; i++) { Collections.shuffle(list); // check 5-th element int fifth = list.get(5); counters[fifth] = counters[fifth] + 1; } System.out.println(shuffles + ": " + Arrays.toString(counters)); } } }

And here are the results:

10: [0, 1, 1, 1, 2, 0, 0, 3, 2, 0] 100: [11, 9, 9, 7, 10, 12, 13, 13, 8, 8] 1000: [100 , 101, 107, 101, 95, 96, 109, 83, 93, 115] 10000: [1015, 942, 990, 1003, 1015, 1037, 977, 1060, 950, 1011]

As you can see, "randomness" depends on the amount of shuffling. If you shuffle the array 10 times, the minimum counter is 0, and the maximum is 3. The difference between these values for 100 shuffles (in percent) is much smaller. The numbers are almost the same for a 10,000 shuffle.

I think this test simulates your use case: you display images at a specific position in a shuffled collection.

Please see the @amit post which describes the meaning of shuffling.

So, the solution for you is to shuffle your array 10 times.

EDIT: @Dave Webb gave a great explanation for this case.

The second thinking is this: in fact, you don’t need to shuffle the list of 1000 elements in order to extract the first 20 elements from it. It is enough to take 20 random elements. You will get the same effect, but a much more effective solution:

 Set<Image> show = new HashSet<Image>(); Random r = new Random(System.currentTimeMillis()); for (int i = 0; show.size() < 20; i++) { show.add(list.get(r.nextInt())); }

0

Alexr Mar 14 '12 at 12:26

source share

With this code, if you see the same image over and over, it means that the same image exists many times in the list. At Whreever you get 1000 images, there are duplicates.

-2

Graham borland Mar 14 '12 at 12:09

source share

Dave webb · Accepted Answer · 2012-03-14T12:19:20+0000

If you show 20 images out of 1000, the probability of seeing any of these 20 repeating in the next iteration is approximately 0.34, so you should not be surprised that the images are repeated.

The chances of seeing a particular image are still one in a thousand, but if you're looking for twenty images, the chances are much higher.

We can calculate the probability that none of the previous 20 images will be repeated as:

  980 979 961 ———— × ——— × ... × ——— ≈ 0.66 1000 999 981

Thus, the probability of seeing a repetition is equal to minus this, or approximately 0.34.

And the probability of seeing an image repeating in either of the following two iterations is as follows:

 1 - (0.66 × 0.66) ≈ 0.56

In other words, you will most likely see a duplicate image in the next two cycles. (And this does not mean that the images are repeated from the second cycle in the third, which will only make it more likely.)

For what it's worth, here is some Java code to perform the above calculation:

 float result = 1.0f; int totalImages = 1000; int displayedImages = 20; for (int i = 0; i < displayedImages; i++) { result = result * (totalImages - displayedImages - i) / (totalImages - i); } System.out.println(result);

Are Collections.shuffle () random enough? Practical examples seem to deny this claim.

More articles: