What is the best way to maintain or measure how well a collection is sorted, we can choose the best sorting algorithm?

Inspired by this question

Choosing which algorithm to use to sort the collection may be better if we know in advance how well the collection is sorted. Is there a way we can measure (or maintain a measurement) of how well a collection is sorted? Can we do this in such a way that the cost of maintaining or measuring how well sorted does not outweigh the advantage of choosing the best sorting algorithm?

+4
source share
8 answers

@Doug add-on:

Deleting cannot make the list less sorted, so you don't need to keep track of it.

When an insert occurs, compare with the elements around to determine if the insert was in order or not. If yes, do not increase the counter. If not, increase the "unsorted" counter.

Perhaps this is too much punishment (i.e. two comparisons per insert). Can you make only one comparison for a fuzzier result? Or I like the idea of ​​just counting inserts.

+3
source

You can use the selection: check N items evenly spaced in the list and see how many are in order. (Of course, this only works in a random access list, but usually this type is sorted.)

There is also a threshold for a small N. If N is small (for example, 10 ), the insertion sort is good, even if the list is not sorted. Java does this optimization for small N in what would otherwise be merge-sort.

+2
source

One requisitioned solution:

Maintain the number of operations (insert / delete) performed since the last sort. The larger the number, the more unsorted the collection.

+2
source

You can measure the frequency of the data - if there are many big changes from element to element, then the data is a high frequency, which indicates a rather random distribution.

If the changes are smaller, then the data is low frequency, which indicates a non-random distribution.

You can also measure the overall trend with a filter - this is the average trend, measured up or down - if you can consider turning the entire array or using sorting for “reverse” data.

There are other dimensions that you can use to give an idea of ​​yo — check the signal processing and see what you can draw.

-Adam

+2
source

There is an Introspective Grade that does exactly what ...

http://ralphunden.net/content/tutorials/a-guide-to-introsort/

+2
source

If you don’t know a priori about the collection, any time taken to measure its sorting will be much more than the savings that you will receive by choosing the optimal sorting algorithm.

If, on the other hand, you are going to sort many data sets that have the same amount of sorting, you can measure the first data set, select an algorithm, and then use it for all subsequent data sets.

+1
source

Ok, first check if sorting is sorted by definition, which will always save you a lot of time :) For the most part, don’t worry about expanding the collection to check if it is sorted during insert / delete operations, if the collection needs to be sorted, use a collection that is sorted a-priory.

If you are trying to extend the collection class to track sorting, just keep a separate sorted list of pointers to items in the collection ...

Finally, for 99.99% of the time, why bother? Just use quicksort. If your dataset is small enough that the constant part of Big O sorting in quicksort will redefine the time series from bubble sorting, the sorting will be so fast that you don’t even have to waste time asking a question.

Are you really telling me that your question is 0.01% of the sort you need to solve?

0
source

This is a great question. My approach to solving this question was to ask: given the list of elements, what is the ability to select two consecutive elements from a list that are sorted. As the list becomes more sorted, the probability will approach 100%.

Calculating this probability is relatively simple:

 int sorted = 0; for (int i = 0; i < list_length; i++) { if (list[i+1] >= list[i]) { sorted++; } } sortedness = sorted/(list_length-1); 

Hope this helps!

0
source

All Articles