How can I improve this C # randomization method?

I think I decided to use this as the simplest and most testable randomization method for the list, but it would be interesting to know about any improvements.

public static IList<T> RandomiseList<T>(IList<T> list, int seed) { Random random = new Random(seed); List<T> takeFrom = new List<T>(list); List<T> ret = new List<T>(takeFrom.Count); while (takeFrom.Count > 0) { int pos = random.Next(0, takeFrom.Count - 1); T item = takeFrom[pos]; takeFrom.RemoveAt(pos); ret.Add(item); } return ret; } 
+4
source share
7 answers

You want to shuffle, and the best way to do this is Fisher-Yates shuffle:

 public static IList<T> Randomise<T>(IList<T> list, int seed) { Random rng = new Random(seed); List<T> ret = new List<T>(list); int n = ret.Length; while (n > 1) { n--; int k = rng.Next(n + 1); // Simple swap of variables T tmp = list[k]; ret[k] = ret[n]; ret[n] = tmp; } return ret; } 
+17
source

I liked the idea of ​​Dennis Palmers about returning the shuffled IEnumerable instead of shuffling the list in place, but using the RemoveAt method makes it slow. Here is an alternative without RemoveAt method:

 public static IEnumerable<T> Shuffle<T>(IEnumerable<T> list, int seed) { Random rnd = new Random(seed); List<T> items = new List<T>(list); for (int i = 0; i < items.Count; i++) { int pos = rnd.Next(i, items.Count); yield return items[pos]; items[pos] = items[i]; } } 

I did this with 10,000 integers, and it's about 30 times faster.

+15
source

Not sure how much this will improve, but it will have a performance advantage if the list is large and you only need the first few random elements.

 public static IEnumerable<T> RandomiseList<T>(IList<T> list, int seed) { Random random = new Random(seed); List<T> takeFrom = new List<T>(list); while (takeFrom.Count > 0) { int pos = random.Next(0, takeFrom.Count - 1); T item = takeFrom[pos]; takeFrom.RemoveAt(pos); yield return item; } } 

Removes the need for a topic list or even a temp spoofing variable.

If I used a lot, I would rewrite it as an extension method.

+3
source

It looks good to me. Note that you will get slightly better performance (especially for large lists) if you initialize ret with the length of list so that the list is not redistributed:

 List<T> ret = new List<T>(list.Count); 
+2
source

What offers are you looking for exactly? efficiency? correctness? You are talking about one unit testing ... I think there might be an improvement.

I really helped develop an online game and a mechanism for shuffling them. I really don't suspect that performance is a big problem, since most of the algorithms you find are basically the same. I would suggest the following,

a. create a random interface

 public interface IRandom { byte NextRandomByte (); } 

Everything that now consumes this interface can now mock the module \ in a controlled way or environment. You really do not want to be a unit test of truly random algorithms - you will not be able to verify your data!

In order to return a byte, a byte is probably the smallest unit of randomness you might want. Not only this, but also subject to the creation of one random byte, generating a sequence of them and combining them together is an easy way to create an even wider range of random data.

Of course, you will need to be careful to bias your data ...

b. Ensuring data quality by reducing bias during arbitrary intervals. Assuming that the underlying data is uniformly random, any interval that is NOT a factor of 256 will lead to bias. Consider this,

 // 250 is not a factor of 256! byte a = random.NextRandomByte () % 250; // values 0-5 are biased! 

In the previous fragment, values ​​0-5 have a probability of occurrence of 2/255, and values ​​6-249 have a probability of 1/255. This is a significant bias over time. One approach is to check the number coming from the generator and discard it if it exceeds the allowable range

 // continually generate random data until it is satisfactory for (byte r = random.NextRandomByte (); r > 250; r = random.NextRandomByte ()) { } byte a = r % 250; // r is guaranteed to be on [0, 250], no longer bias 

The "valid range" can be determined by finding the largest multiple of your interval, which can be represented by your value type. More generalized form

 byte modulo; // specified as parameter byte biasThreshold = (byte.MaxValue / modulo) * modulo; for (; unbiasedValue >= biasThreshold; ) { // generate value unbiasedValue = random.NextRandomByte (); } 

And if you need values ​​greater than a byte, just combine the values ​​together,

 int modulo; // specified as parameter int biasThreshold = (int.MaxValue / modulo) * modulo; for (; unbiasedValue >= biasThreshold; ) { // generate value byte a = random.NextRandomByte (); byte b = random.NextRandomByte (); ... int unbiasedValue = a << 24 + b << 16 + c << 8 + d; } 

with. Consume! Put your algorithms or helpers in non-static extensions or static classes like

 // forgive my syntax, recalling from memory public static class IRandomExtensions { public int GetUnbiasedInteger (this IRandom random, int modulo) { } public int GetUnbiasedUnsignedInteger (this IRandom random, uint modulo) { } public int GetUnbiasedLong (this IRandom random, long modulo) { } public int GetUnbiasedUnsignedLong (this IRandom random, ulong modulo) { } ... } public static class IEnumerableExtensions { public IEnumerable<T> Shuffle<T>(this IEnumerable<T> items, IRandom random) { // shuffle away! ... } } 

The decision on whether to implement them as methods on your interface or as external methods [as I did] is up to you, but keep in mind that their member methods force developers to repeat or duplicate the code. Personally, I like extensions. They are very clean. And sexy.

 int randomNumber = random.UnbiasedInteger (i - 1); List<int> shuffledNumbers = numbers.Shuffle (random); 

Obviously, all of the previous ones are optional, but facilitate unit testing and improve the overall quality of your random data.

Random and honest bones are a very interesting topic in general. If you are interested at all, I highly recommend you one day and do some research. :)

+2
source

There is no statistics to support this, but it would be better if the return value starts as an array of the same length as the list, and then you insert the values ​​directly into the randomly generated index.

0
source

Be aware of the risks of naive shuffling algorithms that look good but don't stand the test!

Check out this great article for an example.

0
source

All Articles