Why using Random in Sort causes [Unable to sort IComparer.Compare error]

I tried to shuffle the list of bytes (List) using either code:

myList.Sort((a, b) => this._Rnd.Next(-1, 1)); 

or

 myList.Sort(delegate(byte b1, byte b2) { return this._Rnd.Next(-1, 1); }); 

and they threw the following error:

Cannot sort because IComparer.Compare () returns inconsistent results. Either the value is not compared with itself, or one value repeatedly compared with another value gives different results. x: '{0}', x type: '{1}', IComparer: '{2}'.

What is wrong with using a random rather than a byte comparison function?

Instead, I tried using the LINQ function.

 var myNewList = myList.OrderBy(s => Guid.NewGuid()); var myNewList = myList.OrderBy(s => this._Rnd.NextDouble()); 

I read that these methods are slower than Shisher from Fisher-Yates, giving only O (n). But it was just interesting to use the Sort and random function.

+6
sorting list c # random linq
source share
3 answers

Because, as the error says, Random is not consistent. You must have a comparator that always returns the same result when you set the same parameters. otherwise the sorting will not be consistent.

Knuth has a random sorting algorithm that worked like an insertion sort, but you changed the value with a randomly selected location in an existing array.

+4
source share

Not only the correlation of the relation is required, but also the obligatory imposition of the full order. For example, you cannot say that “socks are less than shoes, shirts are no less or more than trousers”, “blah blah blah”, submit this to the sorting algorithm and expect to get a topological view from the other end. Sort mappings are called sort sorts because they require a well-formed comparison relationship. In particular, quicksort can work forever or give meaningless results if the comparison relation is not a sequential, transitional, and total ordering.

If you want to shuffle then apply Shuffle Fischer-Yates. (Do it right, although the algorithm is trivial, it almost always runs incorrectly.) If what you want is topological sorting, then implement topological sorting. Use the right tool for the job.

+10
source share

Sorting algorithms usually work by defining a comparison function. The algorithms will re-compare the two elements in the sequence to be sorted, and replace them if their current order does not match the desired order. The differences between the algorithms are mainly related to finding the most efficient way in the given circumstances to perform all comparisons.

In the process of performing all these comparisons, for the same two elements in the sequence you need to compare more than once! Using non-numeric data, this will simplify, let's say you have items with the values ​​"Red" and "Apple". The random comparator selects Apple as the best item in the first comparison. Later, if a random comparator chooses “Red” as the best element, and it goes on and on, you may find yourself in a situation where the algorithm never ends .

You are mostly lucky and nothing happens. But sometimes you don’t do it .. Net not only works well forever, but also protects against it, but it (and should!) Throws an exception when these guards detect an inconsistent order.

Of course, the right way to handle this in the general case is to shuffle Knuth-Fisher-Yates.

It should also be mentioned that there are times when simple Fisher Yates are not suitable. One example is to randomize a sequence of unknown length ... let's say you want to randomly reorder the data received from the network stream, not knowing how much data is in the stream, and transfer the data to a thread in another place as soon as possible.

In this situation, you cannot accurately randomize this data. Without knowing the length of the stream, you do not have enough information to shuffle correctly, and even if you did, you could find the length until it kept everything in RAM or even impractical on the disk. Or you may find that the thread will not work for several hours, but your workflow should go much faster. In this case, you will probably agree (and understanding that this “settling” is important), an algorithm that loads a buffer of sufficient length, randomizes the buffer, gives about half the buffer to the worker thread, and then refills the empty part of the buffer to repeat the process . Even here, you are probably using Knuth-Fisher-Yates for the step that randomizes the buffer.

+1
source share

All Articles