When to use sorting without comparison versus sorting

In the class, we learned about many new disparate varieties to avoid the omega(nlogn) lower bound for all sorting sorts. But for me it was a little incomprehensible whether there was pro and con when to use the family of sorting algorithms.

Is it impossible to configure any data so that sorting algorithms can be used without comparison (radix, bucket, key-indexed)? If so, which comparison point sorts even the existing one?

Sorry for the fact that this is such an elementary question, but I really can not find anything on the Internet.

+4
source share
4 answers

Not every set of elements can be optimized for efficient use in comparisons without comparison. For example, sorting arbitrary precision numbers will require repeatedly starting the loop inside the bucket sorting, which will result in loss of performance.

The problem with radix varieties in the world is that they must check every element of every sortable element. Comparison comparisons, on the other hand, can skip a large number of sub-elements (numbers, characters, etc.) For example, when the comparison function checks two lines, it stops with the first difference, skipping the tails of both lines. Bucket sorting, on the other hand, should check all the characters on each line * .

In general, pursuing better asymptotic complexity is not always a good strategy: the value of N, where using a much more complex algorithm pays off, is often too large to make more complex algorithms practical. For example, quicksort has very poor time complexity, but on average it surpasses most other algorithms, which are reduced due to very low costs, which makes it a good choice in most practical situations.


* In practice, the implementation of sorting in the basket eliminates the need to view all sub-elements (numbers, characters, etc.) by switching to sorting based on sorting, as soon as the number of elements in the bucket falls below a certain threshold. This hybrid approach is superior to both a simple comparative grade and a simple kind of bucket.
+2
source

The problem with sorting without comparison is that their complexity usually depends on other parameters, and not on the size of the input. For example, Radix collation has complexity O (kn), where k is the largest number of digits in an element - the question is how k relates to n. If k is about the same as n, the algorithm becomes O (n ^ 2).

+1
source

Comparison-based sorting algorithms make input assumptions. All input elements must be in a constant-length range to provide linear time complexity. On the other hand, comparison-based sorting algorithms do not make input assumptions and can solve any case. Comparison-based sorting algorithms often occur at the expense of additional memory costs and lack of input commonality.

+1
source

You use comparison-based sorting when you're too lazy to write sorting without comparison.

Sorting based on sorting is inherently slower; they must call the comparator on the input elements a whole bunch of times, and each call gives a mapping based on only one bit of information. The correct comparison-based method should accumulate on average log_2 (n!) ~ = N log (n) bits of information about its input.

Now all the data has a view in the machine. You can adapt the sorting algorithm to a specific data type, its presentation and the machine that you use for sorting, and if you know what you are doing, you will often beat pants with any comparison sorting algorithm.

However, performance is not everything, and there are cases (in most cases I saw, in fact) where the most effective solution is not the right solution. Good comparisons based on sorting can be taken by a black box comparator, and they will sort the input in small values โ€‹โ€‹of time n log (n). And this is good enough for almost all applications.

EDIT . The above really only applies to internal sorting, where you have more than enough RAM to store all the input. External sorting (overflow to disk, say) should usually be done by reading about half the RAM volume of data at a time, using sorting without comparison and writing the sorted result. Carefully sort through sorting with input and output all the time. In the end, you do an n-way merge (based on comparison).

+1
source

All Articles