Sort an almost sorted array (elements are inappropriate no more than k)

I was recently asked this question:

You are given an array that is almost sorted, since each of the elements of N can be lost by no more than k positions from the correct ordered order. Find an efficient space and time algorithm for sorting an array.

I have an O(N log k) solution as follows.

Denote arr[0..n) to denote the elements of the array from index 0 (inclusive) to N (excluding).

  • Sort arr[0..2k)
    • Now we know that arr[0..k) are in their final sorted positions ...
    • ... but arr[k..2k) can still be lost k !
  • Sort arr[k..3k)
    • Now we know that arr[k..2k) are in their final sorted positions ...
    • ... but arr[2k..3k) can still be lost k
  • Sort arr[2k..4k)
  • ....
  • Until you sort arr[ik..N) , then you're done!
    • This last step may be cheaper than other steps when there are less than 2k items left.

At each step, you sort no more than 2k elements in O(k log k) , placing at least the elements k in their final sorted positions at the end of each step. There are steps O(N/k) , so the overall complexity is O(N log k) .

My questions:

  • Optimal is O(N log k) ? Can this be improved?
  • Can you do this without (partially) re-sorting the same elements?
+59
sorting arrays algorithm
Apr 28 '10 at 4:21
source share
5 answers

As Bob Sedgwick demonstrated his dissertation work (and subsequent ones), insertion sorting absolutely suppresses an “almost sorted array”. In this case, your asymptotics look good, but if k <12 I place bets, sorting sorts every time. I don’t know that there is a good explanation of why insertion sorting does this well, but the place to search will be in one of Sedgewick's tutorials called “Algorithms” (he made many issues for different languages).

  • I have no idea if O (N log k) is optimal, but more accurate, I don't care if k is small, these are constant factors that matter, and if k is large, you can just sort the array.

  • Sorting the insert will cause this problem without re-sorting the same elements.

The Big-O notation is very suitable for the class of algorithms, but in the real world, constants are important. It's too easy to overlook this. (And I say this as a professor who taught Big-O notation!)

+34
Apr 28 '10 at 4:40
source share

When using only the comparison model, O (n log k) is optimal. Consider the case when k = n.

To answer your other question, yes, this can be done without sorting using heaps.

Use a mini bunch of 2k items. Insert 2k elements first, then remove min, insert next element, etc.

This guarantees O (n log k) and O (k) times, and heaps usually have fairly small hidden constants.

+19
Apr 28 '10 at 22:52
source share

Since k seems to be fairly small, insertion sorting is probably the most obvious and generally accepted algorithm.

In sorting inserts on random elements, you have to scan through N elements, and you need to move each of them on average at N / 2 positions, which gives the general N * N / 2 operations. The constant "/ 2" is ignored in the large-O (or similar) characteristic, which gives the complexity of O (N 2 ).

In the case where you offer, the expected number of operations is ~ N * K / 2, but since k is a constant, the entire term k/2 ignored in the characteristic of large O, so the overall complexity is O (N).

+6
Apr 28 '10 at 4:36
source share

Your solution is good if k is large enough. There is no better solution in terms of time complexity; each element may turn out to be inappropriate at places k , which means that you need to study the bit of information log2 k to place it correctly, which means that you need to make at least log2 k comparisons - so this should be a complexity of at least O(N log k) .

However, as others have pointed out, if k small, the regular members are going to kill you. In this case, use something very quick for each operation, such as sorting an insert.

If you really wanted to be optimal, you would use both methods and switch from one to the other depending on k .

+6
Apr 28 '10 at 5:01
source share

It has already been pointed out that one of the asymptotically optimal solutions uses a bunch of minutes, and I just wanted to provide the code in Java:

 public void sortNearlySorted(int[] nums, int k) { PriorityQueue<Integer> minHeap = new PriorityQueue<>(); for (int i = 0; i < k; i++) { minHeap.add(nums[i]); } for (int i = 0; i < nums.length; i++) { if (i + k < nums.length) { minHeap.add(nums[i + k]); } nums[i] = minHeap.remove(); } } 
+6
Feb 18 '16 at 2:57
source share



All Articles