An implementation is implemented here that runs only once, collecting runs, and then plans merges in the same way mergesort does.
Complexity is O (n log m), where n is the number of elements, and m is the number of runs. The best case is O (n) (if the data is already sorted), and the worst case is O (n log n), as expected.
Requires temporary memory O (log m); sorting is done locally in lists.
(updated below. commenter one makes a good point to describe it here)
The essence of the algorithm:
while list not empty accumulate a run from the start of the list merge the run with a stack of merges that simulate mergesort recursion merge all remaining items on the stack
Cumulative runs do not require much explanation, but it is useful to take the opportunity to accumulate both upward runs and downward runs (reverse). Here he adds elements smaller than the head of the run and adds elements that are greater than or equal to the end of the run. (Note that adding should use strictly less than to maintain sort stability.)
The easiest way is simply to insert the merge code here:
int i = 0; for ( ; i < stack.size(); ++i) { if (!stack[i]) break; run = merge(run, stack[i], comp); stack[i] = nullptr; } if (i < stack.size()) { stack[i] = run; } else { stack.push_back(run); }
Consider sorting a list (dag i becfjh) (ignoring runs). The state of the stack is as follows:
[ ] [ (d) ] [ () (ad) ] [ (g), (ad) ] [ () () (adgi) ] [ (b) () (adgi) ] [ () (be) (adgi) ] [ (c) (be) (adgi ) ] [ () () () (abcdefgi) ] [ (j) () () (abcdefgi) ] [ () (hj) () (abcdefgi) ]
Then finally merge all of these lists.
Note that the number of elements (runs) on the [i] stack is zero or 2 ^ i, and the stack size is limited to 1 + log2 (nruns). Each element is combined once per stack level, therefore, O (n log m). There is a resemblance to Timsort here, although Timsort maintains its stack using something like a Fibonacci sequence, where it uses the powers of the two.
Cumulative runs use any data already sorted, so the best degree of complexity is O (n) for an already sorted list (one run). Since we accumulate both ascending and descending runs, runs will always be at least 2. (This reduces the maximum stack depth by at least one, paying the cost of finding runs in the first place.) The worst case complexity is O (n log n), as expected, for data with a high degree of randomization.
(Um ... Second update.)
Or just look at wikipedia from bottom to top in volume .