Umm .. 1,000 entries ?? You are still within the dominance of polynomial cooperation here. If I have select-sort: 15 * n ^ 2 (reads) + 5 * n ^ 2 (swaps) insert-sort: 5 * n ^ 2 (reads) + 15 * n ^ 2 (swaps) merge-sort: 200 * n * log (n) (reads) 1000 * n * log (n) (merge)
You will be in a tight race for a long time. By the way, 2 times faster than sorting ANYTHING. Try 100 times slower. That there are real differences. Try not to end in my life algorithms (there are well-known regular expressions that take so long for simple strings).
So, try 1M or 1G records and let us know if you are still not using merge-sort, not too good.
It is said ..
There are many things that make this type of merger expensive. First of all, no one starts up quickly or is sorted by small data structures. Where you have (len <= 1), people usually put: if (len <= 16): (use the built-in insert-sort) else: merge-sort at EVERY distribution level.
Since insertion-sorting has a lower coefficient cost with smaller sizes n. Please note that 50% of your work is done on this last mile.
Then you do not need to run array1.pop (0) instead of maintaining index counters. If you're lucky, python effectively manages the offsets of the beginning of the array, but ceteris paribus you change the input parameters
In addition, you know the size of the target array during the merge, why copy and double the merged_array file. Preassign the size of the target array at the beginning of the function. This will save at least a dozen โclonesโ per merge level.
In general, merge-sort uses 2x of RAM size. Your algorithm probably uses 20x due to all temporary merge buffers (hopefully python can free structures before recursion). This breaks down the elegance, but as a rule, the best merge sorting algorithms make the merge buffer immediately equal to the size of the original array, and you do complex address arithmetic (or array-index + range length) to just continue merging the data-structure back and forth. It will not be as elegant as a simple recursive problem like this, but a little close.
In C-sorting, cache coherence is your biggest enemy. You need hot data structures to maximize your cache. By allocating transient temporary buffers (even if the memory manager returns pointers to hot memory), you run the risk of making slow DRAM calls (prefilling the cache lines for the data you are about to overwrite). This is one of the advantages: insertion sorting, sorting sorting, and quicksort sorting compared to the merge sorting method (if implemented as described above).
Saying something like quick-sort is naturally elegant code, naturally efficient code, and doesn't waste any memory (google it on wikipedia - they have the javascript implementation your code is based on). Squeezing the last ounce of performance from quicksort is difficult (especially in scripting languages, so they usually use C-api to complete this part), and you have the worst case of O (n ^ 2), you can try and be smart by doing the bubble combination -sort / quick-sort to mitigate the worst case.
Happy coding.