Why should we use n-way merge? What are its advantages over the merger of the two sides?

I tried to read several articles about n-way merge, but did not understand the concept. I am confused by why you use n-way merge over two-way merge? For example, how would you divide an array into 3 parts, sort them, then do a two-way merge of two parts, and then two-way merge the third part with these combined 2 parts :)

thanks

+6
source share
2 answers

In the β€œnormal” merge sort, you divide the array by 2 until you reach the depth of log 2 n , and then start the merge. Each merge of two arrays of size m also takes 2m operations.

This will lead you to the following formula (when analyzing time):

  n / 2 * 2 + n / 4 * 4 + ... 1 * n = n * log 2 n 

Now, if you do a three-way merge, you divide the array by 3. The difference with the previous method is twofold:

  • The division depth is now log 3 n .
  • During a merge, instead of comparing two elements, you need to find at least 3 elements.

This means that in the most basic implementation, you get the following formula:

  n / 3 * 2 * 3 + n / 9 * 2 * 9 + ... 1 * 2 * n = 2 * n * log 3 n 

Note that 2 is multiplied because the search for a minimum of three elements consists of two operations.

Asymptotically, these two are Θ(nlogn) . However, it is possible (I have not tried) in practice, three-way sorting will give better performance due to its log 3 n . However, since log 2 n for n = 1,000,000 is only 20, and log 3 n for the same number is 12.5, I doubt that this optimization will be really effective if n not big enough.


With smart implementation, k-way merging can really have a good effect on merge sort. The idea is that as soon as you find a minimum of k elements, you already know the relationship between the other elements of k-1 , which are not minimal. Therefore, consuming this minimal element from your corresponding list, you only need to compare the new value of this list and find its order in relation to the other elements of k-1 . Using a bunch, that would be pretty trivial.


Be sure to check Jerry's answer . I agree with him that the true power of multipath merging is related to working with multiple disks and parallel processing.

+7
source

Usually, you should combine multiple threads to combine when you perform an appearance. For example, suppose, suppose you need to sort a terabyte of data, and there is only (say) 64 gigabytes of RAM.

You usually do this by reading in 64 gigabytes, sorting it, and then writing it. Repeat for a full terabyte of data, creating one intermediate file for each "piece" that you can store in memory right away. There are ways to improve this, but in the best case, you can usually hope that you produce sorted intermediate files of about 128 gigabytes each.

This will allow you to merge together with several intermediate files, and the number will almost certainly be more than 2.

If you do this regularly, you probably have pretty high quality hardware. If you put each intermediate file on a separate disk (and more, at least one for output) you can almost certainly improve the speed by combining all the data together at once, and not just two at a time. The process will usually be O score I /, so reading from (say) 8 disks at a time will usually be about 4 times faster than reading from only 2 disks at a time (although it depends on your output disk having that high throughput, which may be incorrect). If you do not create more intermediate files (which will require further merging), your overall speed is likely to improve with an even more significant factor.

+10
source

All Articles