Update:
I was very interested in this topic, so I sat down and implemented it (using this very fast and conservative implementation of memory ). I also read this one (thanks celion ) and found out that you don’t even need to break the floats into the mantissa and exponent to sort them. You just need to take the bits one to one and do an int-sort. You just need to take care of the negative values that should be put back before the positive values at the end of the algorithm (I did this in one step with the last iteration of the algorithm to save processor time).
So here is my float radixsort:
public static float[] RadixSort(this float[] array) { // temporary array and the array of converted floats to ints int[] t = new int[array.Length]; int[] a = new int[array.Length]; for (int i = 0; i < array.Length; i++) a[i] = BitConverter.ToInt32(BitConverter.GetBytes(array[i]), 0); // set the group length to 1, 2, 4, 8 or 16 // and see which one is quicker int groupLength = 4; int bitLength = 32; // counting and prefix arrays // (dimension is 2^r, the number of possible values of a r-bit number) int[] count = new int[1 << groupLength]; int[] pref = new int[1 << groupLength]; int groups = bitLength / groupLength; int mask = (1 << groupLength) - 1; int negatives = 0, positives = 0; for (int c = 0, shift = 0; c < groups; c++, shift += groupLength) { // reset count array for (int j = 0; j < count.Length; j++) count[j] = 0; // counting elements of the c-th group for (int i = 0; i < a.Length; i++) { count[(a[i] >> shift) & mask]++; // additionally count all negative // values in first round if (c == 0 && a[i] < 0) negatives++; } if (c == 0) positives = a.Length - negatives; // calculating prefixes pref[0] = 0; for (int i = 1; i < count.Length; i++) pref[i] = pref[i - 1] + count[i - 1]; // from a[] to t[] elements ordered by c-th group for (int i = 0; i < a.Length; i++){ // Get the right index to sort the number in int index = pref[(a[i] >> shift) & mask]++; if (c == groups - 1) { // We're in the last (most significant) group, if the // number is negative, order them inversely in front // of the array, pushing positive ones back. if (a[i] < 0) index = positives - (index - negatives) - 1; else index += negatives; } t[index] = a[i]; } // a[]=t[] and start again until the last group t.CopyTo(a, 0); } // Convert back the ints to the float array float[] ret = new float[a.Length]; for (int i = 0; i < a.Length; i++) ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0); return ret; }
This is a bit slower than sorting int radix, due to copying the array at the beginning and end of the function, where floating-point values are bitwise copied to int and vice versa. The whole function, however, is again O (n). In any case, much faster than sorting 3 times in a row, as you suggested. I no longer see a place for optimizations, but if someone does, feel free to let me know.
To sort in descending order, change this line at the very end:
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
to that:
ret[a.Length - i - 1] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
Measurement:
I set up a small test containing all special cases with floating point (NaN, + / -Inf, min / max value, 0) and random numbers. It sorts exactly the same order as Linq or Array.Sort sorts floating point values:
NaN -> -Inf -> Min -> Negative Nums -> 0 -> Positive Nums -> Max -> +Inf
So, I conducted a test with a huge array of 10 million numbers:
float[] test = new float[10000000]; Random rnd = new Random(); for (int i = 0; i < test.Length; i++) { byte[] buffer = new byte[4]; rnd.NextBytes(buffer); float rndfloat = BitConverter.ToSingle(buffer, 0); switch(i){ case 0: { test[i] = float.MaxValue; break; } case 1: { test[i] = float.MinValue; break; } case 2: { test[i] = float.NaN; break; } case 3: { test[i] = float.NegativeInfinity; break; } case 4: { test[i] = float.PositiveInfinity; break; } case 5: { test[i] = 0f; break; } default: { test[i] = test[i] = rndfloat; break; } } }
And the time for various sorting algorithms stopped:
Stopwatch sw = new Stopwatch(); sw.Start(); float[] sorted1 = test.RadixSort(); sw.Stop(); Console.WriteLine(string.Format("RadixSort: {0}", sw.Elapsed)); sw.Reset(); sw.Start(); float[] sorted2 = test.OrderBy(x => x).ToArray(); sw.Stop(); Console.WriteLine(string.Format("Linq OrderBy: {0}", sw.Elapsed)); sw.Reset(); sw.Start(); Array.Sort(test); float[] sorted3 = test; sw.Stop(); Console.WriteLine(string.Format("Array.Sort: {0}", sw.Elapsed));
And the conclusion was (update: now it starts with the release of the assembly, and not debugging):
RadixSort: 00:00:03.9902332 Linq OrderBy: 00:00:17.4983272 Array.Sort: 00:00:03.1536785
about four times faster than Link. It's not bad. But still not as fast as Array.Sort , but also not much worse. But I was really surprised by this: I expected it to be a bit slower than Linq on very small arrays. But then I ran a test with 20 elements:
RadixSort: 00:00:00.0012944 Linq OrderBy: 00:00:00.0072271 Array.Sort: 00:00:00.0002979
and even this time my Radixsort is faster than Linq, but much slower than sorting arrays. :)
Update 2:
I took a few more measurements and found some interesting things: longer group length constants mean less iterations and more memory usage. If you use a group of 16 bits in length (only 2 iterations), you will have huge overhead when sorting small arrays, but you can split Array.Sort if we are talking about arrays larger than 100 thousand elements, even if not very many. The axis of the diagrams is logarithm:

(source: daubmeier.de )