When is it worth sorting an array?

A few years ago, during an interview, I was asked: when is it worth sorting an array? I remember that I couldn’t answer correctly, I recently took a course in the algorithm, and I came to the conclusion that providing a more “academic” answer would probably provide me with this work ... in any case, it is impossible to correct the past while I try officially answer it myself, currently I'm here:

For the array, the search time will be

  • O (n) if not sorted
  • O (log (n)) when sorting

Given that quick sorts are sorted in O (n * log (n))

When is it worth sorting an array? Of course, this depends on how many times we look for the array.

  • Search cost x times in a sorted array = O (n * log (n)) + [O (log (n)) * x]
  • Search cost x times in unsorted array = O (n) * x

What will be the value of x?

+7
sorting arrays algorithm big-o
source share
1 answer

Personally, I would answer that it is worth sorting an array if one of the following conditions is true:

  • we plan to often request the highest value in the array (reduce the cost from O (n) to O (1)),
  • we plan to often request the lowest value in the array (reduce the cost from O (n) to O (1)),
  • we will often search for a given value in an array (reduce the cost from O (n) to O (log (n)).

If you can sort the array in O (n) (for example, the data meets the sorting criteria), we will start to profit from the search operation (so that the total time, including the time required for sorting, will be less than the time spent searching in an unsorted array) after k operations, where k = constantOfSortingOperation / (n / log (n)) (the time required to sort the array divided by the gain from the Sorter search).

If we sort the array in O (nlogn) using for ex. HeapSort or QuickSort (where the constant hidden in the Big-O notation is small), we will start to get from the search operation after k = (constant * nlogn) / (n / logn) . constant / nlogn is basically how many times we could look for an unsorted array if we spend time not on sorting, but on searching. n / logn - how much we get from one search in the sorter array compared to a search in an unsorted array. Therefore, if we believe that our constant is small (much less than n), the time after the start of receipt (= x, more or less) will be approximately n * logn * logn / n = (log (n)) ^ 2.

If we turn on the calculation of the winnings from getting the largest / smallest value, we start to get the array from sorting much faster.

+2
source share

All Articles