Find the median with the minimum time in the array

I have an array, say a = { 1,4,5,6,2,23,4,2}; now I have to find the median position of the array from 2 to 6 (odd full terms), so what I did, I took a[1] in a[5] in arr[0] to arr[4] , then I sorted it and write arr[2] as a median.

But here every time I enter values ​​from one array to another, so that the values ​​of my original array remain the same. secondly, I’m sorted, so this procedure takes quite a lot **time** . So I want to know if there is a way that I can do it differently, less my computation time .

Any sites, materials for understanding what and how to do?

+8
source share
6 answers

If you are doing multiple queries in the same array, you can use the Segment Tree. Usually they are used to execute minimum / maximum range values ​​and range sum queries, but you can change them to the median range.

The segment tree for dialing at n intervals uses O (n log n) memory and can be built in O (n log n) time. A range request can be made in O (log n).

An example of a median in a range segment tree:

You build a tree of segments from bottom to top (update from top to bottom):

  [5] [3] [7] [1,2] [4] [6] [8] 1 2 3 4 5 6 7 8 

Indexes covered by node:

  [4] [2] [6] [0,1] [3] [5] [7] 0 1 2 3 4 5 6 7 

The query for the median for indices in the range 4-6 will go along this value path:

  [4] [5] 0 1 2 3 4 5 6 7 

When searching for the median, you know the number of complete elements in the query (3), and the median in this range will be the second element (index 5). So you essentially search for the first node that contains this node index with the values ​​[1,2] (indexes 0,1).

Performing a search for a median of range 3-6 is a bit more complicated, because you need to look for two indexes (4,5) that are in the same node.

  [4] [6] [5] 0 1 2 3 4 5 6 7 

Segment tree

Minimum range query on a segment tree

+4
source

Use std::nth_element from <algorithm> , which is O (N):

 nth_element(a, a + size / 2, a + size); median = a[size/2]; 
+22
source

You can find the median without sorting by time O (n); the algorithms that do this are called selection algorithms .

+15
source

To find the median of an array of less than 9 elements, I believe that the most efficient is to use a sorting algorithm such as insertion sort. The complexity is bad, but for such a small array due to k the complexity of the best algorithms such as quicksort, insertion sorting is very efficient. Make your own test, but I can say that you will have better results with insert sort than with shell sort or quick sort.

+1
source

I think the best way is to use the median of the median of the counting algorithm of the k-th largest element of the array. You can find the general idea of ​​the algorithm here: Median of medians in Java , Wikipedia: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm or just browse the Internet. During implementation, some general improvements can be made (avoid sorting when choosing the median of specific arrays). However, note that for an array of less than 50 elements, it is more efficient to use the sort insert than the median of the median algorithm.

0
source

All existing answers have some drawbacks in certain situations:

  1. Sorting the entire subrange is not very efficient, because you do not need to sort the entire array to get the median, and you need an additional array if you need to find several medians of the subrange.
  2. Using std::nth_element more efficient, but it still mutates the subrange, so an extra array is required.
  3. Using the segment tree gives you an effective solution, but you need to either implement the structure yourself, or use a third-party library.

For this reason, I am publishing my approach using std::map and based on a selection sorting algorithm:

  1. First, collect the frequencies of the elements of the first subband into the std::map<int, int> object.
  2. Using this object, we can efficiently find the median of a subrange whose length is subrangeLength :

     double median(const std::map<int, int> &histogram, int subrangeLength) { const int middle{subrangeLength / 2}; int count{0}; /* We use the fact that keys in std::map are sorted, so by simply iterating and adding up the frequencies, we can find the median. */ if (subrangeLength % 2 == 1) { for (const auto &freq : histogram) { count += freq.second; /* In case where subrangeLength is odd, "middle" is the lower integer bound of subrangeLength / 2, so as soon as we cross it, we have found the median. */ if (count > middle) { return freq.first; } } } else { std::optional<double> medLeft; for (const auto &freq : histogram) { count += freq.second; /* In case where subrangeLength is even, we need to pay attention to the case when elements at positions middle and middle + 1 are different. */ if (count == middle) { medLeft = freq.first; } else if (count > middle) { if (!medLeft) { medLeft = freq.first; } return (*medLeft + freq.first) / 2.0; } } } return -1; } 
  3. Now, when we want to get the median of the next subrange, we simply update the histogram, reducing the frequency of the deleted item, and add / increase it for the new item (using std::map this is done in constant time ). Now we again calculate the median and continue with this until we process all the subranges.

0
source

All Articles