Merging sorted arrays, what is the optimal time complexity?

Question

Merging sorted arrays, what is the optimal time complexity?

I have m arrays, each array has a length n. Each array is sorted. I want to create a single array of length m * n containing all the values of previous arrays (including duplicate values), sorted. I have to combine these arrays.

I think the optimal time complexity is m * n * log (m)

Here's a sketch of the algorithm ..

I am creating an H support array from lenth m containing all the values of the first element of each array.

Then I sort this array (m log m) and move the min value to the output array.

Then I replace the moved value with the next one from the array that was made. In fact, I do not replace it, but I insert it into the desired (sorted) position. I think it will take a magazine.

And I repeat this for all m * n values ... therefore m * n * log m

My question is ... can you come up with a more efficient algorithm? If mnlogm is actually optimal, can you at least think of a simpler and more elegant algorithm?

+6

sorting arrays algorithm complexity-theory data-structures

francesco delvinis Feb 25 '11 at 10:20

source share

2 answers

ltjax · Answer 1 · 2011-02-25T10:34:40+0000

The difficulty is right! However, there is a small mistake in your idea of the algorithm: you cannot insert an element into a sorted array in log m . You can find your position using binary search in this complexity, but you may have to move elements around to place them there. To fix this problem, you can use the heap data structure!

Multithreaded merging (which is the common name of your algorithm) is usually accomplished with another “merge” of the data structure: the tournament tree. You can find a description in Knuth's “The Art of Computer Programming" (sorting chapter, iirc). It has a lower constant factor in theory and practice compared to heaps in this particular case.

If you want to take a look at the implementation, I am sure that parallel multi-threaded merging in parallel extensions of the GNU C ++ standard library is implemented in this way.

Edit: I referenced the wrong book, which is now fixed.

TimCodes.NET · Answer 2 · 2011-02-25T10:35:12+0000

The best you can do is O (m * n + d). Similar to counting sorting: http://en.wikipedia.org/wiki/Counting_sort If you know the range of possible values (d, say), you can initialize an array of length d and then scan each of the arrays m, adding 1 to each "bunker" in d for each value corresponding to this hopper. Then in your new array of length m * n for each value from d you add, however, a lot of counters that have a bit.

Merging sorted arrays, what is the optimal time complexity?

More articles: