What is the fastest algorithm for sorting a linked list?

I am curious if O (n log n) can best relate a list.

+81
sorting linked-list algorithm complexity-theory
Oct 06 '09 at 11:51
source share
11 answers

It is reasonable to expect that you cannot do better than O (N log N) at run time.

However, the interesting part is to find out if you can sort it in-place , stably , its worst behavior, etc.

Simon Tatham, Putty's fame, explains how to sort a linked list with a merge sort . He concludes the following remarks:

Like any self-respecting type algorithm, this has an O (N log N) runtime. Since this is Mergesort, the worst-case runtime is still O (N log N); no pathological cases.

The requirement for additional storage is small and constant (i.e., several variables as part of the sorting procedure). Due to the different behavior of linked lists from arrays, this Mergesort implementation eliminates the additional O (N) storage cost typically associated with the algorithm.

There is also an example implementation in C that works for both single and doubly linked lists.

As @ Jørgen Fogh mentions below, Big-O notation can hide some persistent factors that can lead to one algorithm working better due to locality of memory due to a small number of elements, etc.

+83
06 Oct '09 at 12:05
source share

Depending on a number of factors, it may be faster to copy the list into an array and then use Quicksort .

The reason this could be faster is because the array is much better in cache than a linked list. If the nodes in the list are allocated in memory, you can generate cache misses everywhere. Again, if the array is large, you will still get cache misses.

Mergesort is better parallelized, so this might be the best choice if that's what you want. It is also much faster if you execute it directly in a linked list.

Since both algorithms run in O (n * log n), making an informed decision involves profiling them like on the machine on which you would like to run them.

--- EDIT

I decided to test my hypothesis and wrote a C program that measured the time (using clock() ) that was taken to sort the linked int list. I tried with a linked list where each node was assigned malloc() and a linked list where the nodes were laid out linearly in the array, so cache performance would be better. I compared them with the built-in qsort, which included copying everything from a fragmented list to an array and copying the result back. Each algorithm was run on the same 10 data sets, and the results were averaged.

Here are the results:

N = 1000:

Fragmented list with merge sort: 0.000000 seconds

Array with qsort: 0.000000 seconds

Packed list with merge sort: 0.000000 seconds

N = 100000:

Fragmented list with merge sort: 0.039000 seconds

Array with qsort: 0.025000 seconds

Packed list with merge sort: 0.009000 seconds

N = 1,000,000:

Fragmented list with merge sort: 1.162000 seconds

Array with qsort: 0.420000 seconds

Packed list with merge sort: 0.112000 seconds

N = 100,000,000:

Fragmented list with merge sort: 364.797000 seconds

Array with qsort: 61.166000 seconds

Packed list with merge sort: 16.525000 seconds

Output:

At least on my machine, copying to an array is worth it to improve cache performance, since in real life you rarely have a fully-packed list of links. It should be noted that my machine has a 2.8 GHz Phenom II, but only 0.6 GHz RAM, so the cache is very important.

+60
Oct 06 '09 at 12:57
source share

Sort comparisons (i.e. based on element comparisons) cannot be faster than n log n . It doesn't matter what the underlying data structure is. See Wikipedia .

Other types of sorting that use many identical elements in a list (for example, counting sort) or some expected distribution of elements in a list are faster, although I can't think of a particularly good work in a linked list.

+6
Oct 06 '09 at 12:01
source share

As indicated many times, the lower bound of the sorting based on comparison for common data will be O (n log n). For a brief repetition of these arguments, there is n! various ways to sort the list. Any comparison tree that has n! (which is in O (n ^ n)) for possible final sorts, you need at least log (n!) as its height: this gives you the lower bound of O (log (n ^ n)), which is O (n log n )

Thus, for shared data in a linked list, the best sorting method that will work with any data that can compare two objects is O (n log n). However, if you have a more limited area of ​​work, you can improve the time it takes (at least in proportion to n). For example, if you work with integers no greater than some value, you can use Counting Sort or Radix Sort , since they use specific objects that you sort to reduce complexity with a fraction of n. Be careful, however, they add some other things to complexity that you cannot take into account (for example, counting sorting and sorting Radix add as factors based on the size of the numbers to be sorted, O (n + k), where k is the size of the largest number Counting Sort, for example).

Also, if you have objects that have the perfect hash (or at least a hash that displays all the values ​​differently), you can try using counting or radix sorting for your hash functions.

+5
Oct 06 '09 at 18:12
source share

This is a good little article on this topic. His empirical conclusion is that Treesort is best, followed by Quicksort and Mergesort. Sediment grade, bubble sorting, sorting sorting are performed very poorly.

COMPARATIVE RESEARCH OF SORTED LIST ALGORITHMS by Ching-Kuang Shene

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.9981

+5
Dec 29 '10 at 17:56
source share

A Radix sort is especially suitable for a linked list, since it is easy to make a table of main pointers corresponding to each possible digit value.

+3
Oct 06 '09 at 18:25
source share

Merge sort does not require O (1) access and is O (n ln n). No known algorithms for sorting common data are better than O (n ln n).

Special data algorithms, such as radix sorting (data size limitation) or histogram sorting (discrete data counting), can sort a linked list with a lower growth function if you use a different structure with O (1) access as temporary storage .

Another class of special data is sorting a sorted list, sorted in order. This can be sorted in O (kn) operations.

Copying the list to and from the array will be O (N), so any sorting algorithm can be used if space is not a problem.

For example, if the linked list contains uint_8 , this code will sort it O (N) times using histogram sorting:

 #include <stdio.h> #include <stdint.h> #include <malloc.h> typedef struct _list list_t; struct _list { uint8_t value; list_t *next; }; list_t* sort_list ( list_t* list ) { list_t* heads[257] = {0}; list_t* tails[257] = {0}; // O(N) loop for ( list_t* it = list; it != 0; it = it -> next ) { list_t* next = it -> next; if ( heads[ it -> value ] == 0 ) { heads[ it -> value ] = it; } else { tails[ it -> value ] -> next = it; } tails[ it -> value ] = it; } list_t* result = 0; // constant time loop for ( size_t i = 255; i-- > 0; ) { if ( tails[i] ) { tails[i] -> next = result; result = heads[i]; } } return result; } list_t* make_list ( char* string ) { list_t head; for ( list_t* it = &head; *string; it = it -> next, ++string ) { it -> next = malloc ( sizeof ( list_t ) ); it -> next -> value = ( uint8_t ) * string; it -> next -> next = 0; } return head.next; } void free_list ( list_t* list ) { for ( list_t* it = list; it != 0; ) { list_t* next = it -> next; free ( it ); it = next; } } void print_list ( list_t* list ) { printf ( "[ " ); if ( list ) { printf ( "%c", list -> value ); for ( list_t* it = list -> next; it != 0; it = it -> next ) printf ( ", %c", it -> value ); } printf ( " ]\n" ); } int main ( int nargs, char** args ) { list_t* list = make_list ( nargs > 1 ? args[1] : "wibble" ); print_list ( list ); list_t* sorted = sort_list ( list ); print_list ( sorted ); free_list ( list ); } 
+2
Oct 06 '09 at 11:56
source share

This is not a direct answer to your question, but if you use the Skip List , it is already sorted and has O (log N) search time.

+1
Oct. 06 '09 at 11:53
source share

Mergesort is the best you can do here.

+1
Oct 06 '09 at 11:53
source share

As I know, the best sorting algorithm is O (n * log n), regardless of the container. It has been proven that sorting in the broadest sense of the word (style mergesort / quicksort, etc.) cannot be omitted below. Using a linked list will not give you a better time to work.

The only algorithm that works in O (n) is the hack algorithm, which relies on counting values, rather than on actual sorting.

+1
06 Oct '09 at 11:54
source share

An implementation is implemented here that runs only once, collecting runs, and then plans merges in the same way mergesort does.

Complexity is O (n log m), where n is the number of elements, and m is the number of runs. The best case is O (n) (if the data is already sorted), and the worst case is O (n log n), as expected.

Requires temporary memory O (log m); sorting is done locally in lists.

(updated below. commenter one makes a good point to describe it here)

The essence of the algorithm:

  while list not empty accumulate a run from the start of the list merge the run with a stack of merges that simulate mergesort recursion merge all remaining items on the stack 

Cumulative runs do not require much explanation, but it is useful to take the opportunity to accumulate both upward runs and downward runs (reverse). Here he adds elements smaller than the head of the run and adds elements that are greater than or equal to the end of the run. (Note that adding should use strictly less than to maintain sort stability.)

The easiest way is simply to insert the merge code here:

  int i = 0; for ( ; i < stack.size(); ++i) { if (!stack[i]) break; run = merge(run, stack[i], comp); stack[i] = nullptr; } if (i < stack.size()) { stack[i] = run; } else { stack.push_back(run); } 

Consider sorting a list (dag i becfjh) (ignoring runs). The state of the stack is as follows:

  [ ] [ (d) ] [ () (ad) ] [ (g), (ad) ] [ () () (adgi) ] [ (b) () (adgi) ] [ () (be) (adgi) ] [ (c) (be) (adgi ) ] [ () () () (abcdefgi) ] [ (j) () () (abcdefgi) ] [ () (hj) () (abcdefgi) ] 

Then finally merge all of these lists.

Note that the number of elements (runs) on the [i] stack is zero or 2 ^ i, and the stack size is limited to 1 + log2 (nruns). Each element is combined once per stack level, therefore, O (n log m). There is a resemblance to Timsort here, although Timsort maintains its stack using something like a Fibonacci sequence, where it uses the powers of the two.

Cumulative runs use any data already sorted, so the best degree of complexity is O (n) for an already sorted list (one run). Since we accumulate both ascending and descending runs, runs will always be at least 2. (This reduces the maximum stack depth by at least one, paying the cost of finding runs in the first place.) The worst case complexity is O (n log n), as expected, for data with a high degree of randomization.

(Um ... Second update.)

Or just look at wikipedia from bottom to top in volume .

+1
Dec 08 '15 at 19:02
source share



All Articles