Why is my MergeSort so slow in Python?

Question

Why is my MergeSort so slow in Python?

I am having trouble understanding this behavior. I measure the runtime using the timeit module and get the following results for 10000 loops:

Merger: 1.22722930395
Bubble: 0.810706578175
Select: 0.469924766812

This is my code for MergeSort:

def mergeSort(array): if len(array) <= 1: return array else: left = array[:len(array)/2] right = array[len(array)/2:] return merge(mergeSort(left),mergeSort(right)) def merge(array1,array2): merged_array=[] while len(array1) > 0 or len(array2) > 0: if array2 and not array1: merged_array.append(array2.pop(0)) elif (array1 and not array2) or array1[0] < array2[0]: merged_array.append(array1.pop(0)) else: merged_array.append(array2.pop(0)) return merged_array

Edit:

I changed list operations to use pointers, and my tests now work with a list of 1000 random numbers from 0 to 1000. (By the way, I only changed 10 loops here)

result:

Merger: 0.0574434420723
Bubble: 1.74780097558
Select: 0.362952293025

This is my rewritten merge definition:

 def merge(array1, array2): merged_array = [] pointer1, pointer2 = 0, 0 while pointer1 < len(array1) and pointer2 < len(array2): if array1[pointer1] < array2[pointer2]: merged_array.append(array1[pointer1]) pointer1 += 1 else: merged_array.append(array2[pointer2]) pointer2 += 1 while pointer1 < len(array1): merged_array.append(array1[pointer1]) pointer1 += 1 while pointer2 < len(array2): merged_array.append(array2[pointer2]) pointer2 += 1 return merged_array

seems to work pretty well :)

+3

python sorting algorithm mergesort

algorithms Aug 15 '11 at 9:59

source share

4 answers

list.pop(0) displays the first element and transfers all the rest, this is an additional operation O (n), which should not be performed.

In addition, slicing the list object creates a copy:

 left = array[:len(array)/2] right = array[len(array)/2:]

This means that you also use O (n * log (n)) instead of O (n).

I do not see BubbleSort, but I am sure that it works on the spot, it is not surprising that it is faster.

You need to rewrite it to work on the spot. Instead of copying part of the source list, pass the start and end indices.

+10

hamstergene Aug 15 '11 at 10:27

source share

Your merge sort has a large constant coefficient, you need to run it in large lists to see the advantages of asymptotic complexity.

+1

Karoly Horvath Aug 15 '11 at 10:10

source share

Umm .. 1,000 entries ?? You are still within the dominance of polynomial cooperation here. If I have select-sort: 15 * n ^ 2 (reads) + 5 * n ^ 2 (swaps) insert-sort: 5 * n ^ 2 (reads) + 15 * n ^ 2 (swaps) merge-sort: 200 * n * log (n) (reads) 1000 * n * log (n) (merge)

You will be in a tight race for a long time. By the way, 2 times faster than sorting ANYTHING. Try 100 times slower. That there are real differences. Try not to end in my life algorithms (there are well-known regular expressions that take so long for simple strings).

So, try 1M or 1G records and let us know if you are still not using merge-sort, not too good.

It is said ..

There are many things that make this type of merger expensive. First of all, no one starts up quickly or is sorted by small data structures. Where you have (len <= 1), people usually put: if (len <= 16): (use the built-in insert-sort) else: merge-sort at EVERY distribution level.

Since insertion-sorting has a lower coefficient cost with smaller sizes n. Please note that 50% of your work is done on this last mile.

Then you do not need to run array1.pop (0) instead of maintaining index counters. If you're lucky, python effectively manages the offsets of the beginning of the array, but ceteris paribus you change the input parameters

In addition, you know the size of the target array during the merge, why copy and double the merged_array file. Preassign the size of the target array at the beginning of the function. This will save at least a dozen “clones” per merge level.

In general, merge-sort uses 2x of RAM size. Your algorithm probably uses 20x due to all temporary merge buffers (hopefully python can free structures before recursion). This breaks down the elegance, but as a rule, the best merge sorting algorithms make the merge buffer immediately equal to the size of the original array, and you do complex address arithmetic (or array-index + range length) to just continue merging the data-structure back and forth. It will not be as elegant as a simple recursive problem like this, but a little close.

In C-sorting, cache coherence is your biggest enemy. You need hot data structures to maximize your cache. By allocating transient temporary buffers (even if the memory manager returns pointers to hot memory), you run the risk of making slow DRAM calls (prefilling the cache lines for the data you are about to overwrite). This is one of the advantages: insertion sorting, sorting sorting, and quicksort sorting compared to the merge sorting method (if implemented as described above).

Saying something like quick-sort is naturally elegant code, naturally efficient code, and doesn't waste any memory (google it on wikipedia - they have the javascript implementation your code is based on). Squeezing the last ounce of performance from quicksort is difficult (especially in scripting languages, so they usually use C-api to complete this part), and you have the worst case of O (n ^ 2), you can try and be smart by doing the bubble combination -sort / quick-sort to mitigate the worst case.

Happy coding.

0

M. maraist Aug 15 '11 at 10:45

source share

huitseeker · Accepted Answer · 2011-08-15T11:41:17+0000

To start: I can’t reproduce the synchronization results, for 100 cycles and lists of size 10000. The initial standard with timeit all the implementations discussed in this answer (including bubblesort and your original fragment) is published as gist here . For the average duration of one run, I find the following results:

Python native (Tim) sort: 0.0144600081444
Bubblesort: 26.9620819092
(your) Original Mergesort: 0.224888720512

Now, to make your function faster, you can do several things.

Edit : Well, apparently I was wrong about that (thanks cwillu ). The length calculation takes O (1) in python . But removing useless calculations everywhere still improves the situation a bit (Original Mergesort: 0.224888720512, no-length Mergesort: 0.195795390606):

 def nolenmerge(array1,array2): merged_array=[] while array1 or array2: if not array1: merged_array.append(array2.pop(0)) elif (not array2) or array1[0] < array2[0]: merged_array.append(array1.pop(0)) else: merged_array.append(array2.pop(0)) return merged_array def nolenmergeSort(array): n = len(array) if n <= 1: return array left = array[:n/2] right = array[n/2:] return nolenmerge(nolenmergeSort(left),nolenmergeSort(right))

Secondly, as pointed out in this answer , pop(0) is linear. Rewrite your merge to pop() at the end :

 def fastmerge(array1,array2): merged_array=[] while array1 or array2: if not array1: merged_array.append(array2.pop()) elif (not array2) or array1[-1] > array2[-1]: merged_array.append(array1.pop()) else: merged_array.append(array2.pop()) merged_array.reverse() return merged_array

This is faster again: no-len Mergesort: 0.195795390606, no-len Mergesort + fastmerge: 0.126505711079

The third one is , and this would be useful only if you used a language that optimizes tail optimization , without it it is a bad idea - your merge call is not tail-recursive ; it calls the recursive expressions (mergeSort left) and (mergeSort right) , while the remaining work remains in the call (merge) .
But you can do tail-recursive merges using CPS (this will end up with a stack size even for modest lists if you t tco):
```
 def cps_merge_sort(array): return cpsmergeSort(array,lambda x:x) def cpsmergeSort(array,continuation): n = len(array) if n <= 1: return continuation(array) left = array[:n/2] right = array[n/2:] return cpsmergeSort (left, lambda leftR: cpsmergeSort(right, lambda rightR: continuation(fastmerge(leftR,rightR)))) 
```
Once this is done, you can do the TCO manually to postpone the call stack control performed by recursion into the while loop of the normal function ( trampolining , explained, for example, here , the trick was originally for Guy Steele). Trampolining and CPS work great .
You write a thunking function that "writes" and delays the application: it takes a function and its arguments and returns the function returned (this original function applies to these arguments).
```
 thunk = lambda name, *args: lambda: name(*args) 
```
Then you write a trampoline that manages calls in thunks: it applies thunk until thunk returns the result (unlike another thunk)
```
 def trampoline(bouncer): while callable(bouncer): bouncer = bouncer() return bouncer 
```
Then all that is left is to “thunk” all your recursive calls from the original CPS function so that the trampoline unwraps them in the correct sequence. Now your function returns thunk without recursion (and discarding its own frame) with every call:
```
 def tco_cpsmergeSort(array,continuation): n = len(array) if n <= 1: return continuation(array) left = array[:n/2] right = array[n/2:] return thunk (tco_cpsmergeSort, left, lambda leftR: thunk (tco_cpsmergeSort, right, lambda rightR: (continuation(fastmerge(leftR,rightR))))) mycpomergesort = lambda l: trampoline(tco_cpsmergeSort(l,lambda x:x)) 
```

Unfortunately, this is not so fast (recursive mergesort: 0.126505711079, this trampoline version: 0.170638551712). OK, I think the bloating of the stack of the recursive merge sorting algorithm is actually modest : as soon as you exit the leftmost path in the recursive array template, the algorithm will start returning (& deleting frames). Thus, for lists of size 10K, you get a function stack of at most log_2 (10,000) = 14 ... rather modest.

You can do a slightly more complex stack-based TCO elimination in the form of this SO response :

  def leftcomb(l): maxn,leftcomb = len(l),[] n = maxn/2 while maxn > 1: leftcomb.append((l[n:maxn],False)) maxn,n = n,n/2 return l[:maxn],leftcomb def tcomergesort(l): l,stack = leftcomb(l) while stack: # l sorted, stack contains tagged slices i,ordered = stack.pop() if ordered: l = fastmerge(l,i) else: stack.append((l,True)) # store return call rsub,ssub = leftcomb(i) stack.extend(ssub) #recurse l = rsub return l

But this only goes a little faster (trampolined mergesort: 0.170638551712, this version is based on the stack: 0.144994809628). It seems python python to build a stack on recursive calls of our original merge is pretty inexpensive.

The final results? on my machine (Ubuntu natty stock Python 2.7.1+) the average launch time intervals (out of 100 runs, with the exception of Bubblesort-, a list of size 10000 containing random integers of size 0-10000000):

Python native (Tim) sort: 0.0144600081444
Bubblesort: 26.9620819092
Original Mergesort: 0.224888720512
no-len Mergesort: 0.195795390606
no-len Mergesort + fastmerge: 0.126505711079
trampolined CPS Mergesort + fastmerge: 0.170638551712
stack based merge + fastmerge: 0.144994809628

Why is my MergeSort so slow in Python?

Edit:

More articles: