Find the number of elements greater than x in a given range

Question

Find the number of elements greater than x in a given range

Given an array with n elements, how to find the number of elements greater than or equal to a given value (x) in a given index of the range i, index j in O (log n) complexity?

The queries have the form (i, j, x), which means finding the number of elements greater than x from the i-th to the j-th element in the array

The array is not sorted. i, j and x are different for different requests. The elements of the array are static. Edit: i, j, x can all be different for different requests!

+9

algorithm

harunrashid Sep 7 '16 at 7:28

source share

5 answers

This is only possible if you have sorted the array. In this case, a binary search for the smallest value conveys your condition and calculates the score simply by dividing your range of indices by the found position into two intervals. Then just calculate the length of the interval that conveys your condition.

If the array is not sorted, and you need to keep its order, you can use index sorting. When combining:

Definition
Let <i0,i1> be your used range of indices and x be your value.
index sorting array <i0,i1>
so create an array of size m=i1-i0+1 and index its sorting. This task is O(m.log(m)) , where m<=n .
binary search x position in an array of indices
This task is O(log(m)) and you need an index j = <0,m) for which array[index[j]]<=x is the smallest value <=x
count the amount
Just count the number of indices after j to m
```
 count = mj; 
```

As you can see if the array is sorted, you got O(log(m)) complexity, but if it is not, you need to sort O(m.log(m)) , which is worse than the naive O(m) approach that follows use only if the array changes frequently and cannot be sorted directly.

[Edit1] What do I mean by index sorting

By type of indexes, I mean the following: let it have an array a

 a[] = { 4,6,2,9,6,3,5,1 }

Sorting an index means that you create a new array of ix indices in sorted order, for example, by increasing the sorting of the index:

 a[ix[i]]<=a[ix[i+1]]

In our example, sorting the index bubbles looks something like this:

 // init indexes a[ix[i]]= { 4,6,2,9,6,3,5,1 } ix[] = { 0,1,2,3,4,5,6,7 } // bubble sort 1st iteration a[ix[i]]= { 4,2,6,6,3,5,1,9 } ix[] = { 0,2,1,4,5,6,7,3 } // bubble sort 2nd iteration a[ix[i]]= { 2,4,6,3,5,1,6,9 } ix[] = { 2,0,1,5,6,7,4,3 } // bubble sort 3th iteration a[ix[i]]= { 2,4,3,5,1,6,6,9 } ix[] = { 2,0,5,6,7,1,4,3 } // bubble sort 4th iteration a[ix[i]]= { 2,3,4,1,5,6,6,9 } ix[] = { 2,5,0,7,6,1,4,3 } // bubble sort 5th iteration a[ix[i]]= { 2,3,1,4,5,6,6,9 } ix[] = { 2,5,7,0,6,1,4,3 } // bubble sort 6th iteration a[ix[i]]= { 2,1,3,4,5,6,6,9 } ix[] = { 2,7,5,0,6,1,4,3 } // bubble sort 7th iteration a[ix[i]]= { 1,2,3,4,5,6,6,9 } ix[] = { 7,2,5,0,6,1,4,3 }

Thus, the result of index sorting increases as follows:

 // ix: 0 1 2 3 4 5 6 7 a[] = { 4,6,2,9,6,3,5,1 } ix[] = { 7,2,5,0,6,1,4,3 }

The original array remains unchanged, only the index array changes. Elements a[ix[i]] , where i=0,1,2,3... are sorted in ascending order.

So now, if x=4 on this interval, you need to find (search in the bean) which i has the smallest, but still a[ix[i]]>=x like this:

 // ix: 0 1 2 3 4 5 6 7 a[] = { 4,6,2,9,6,3,5,1 } ix[] = { 7,2,5,0,6,1,4,3 } a[ix[i]]= { 1,2,3,4,5,6,6,9 } // * i = 3; m=8; count = mi = 8-3 = 5;

So answer 5 : >=4

[Edit2] Just to be sure you know what binary search means for this

 i=0; // init value marked by `*` j=4; // max power of 2 < m , i+j is marked by `^` // ix: 0 1 2 3 4 5 6 7 ij i+ja[ix[i+j]] a[ix[i]]= { 1,2,3,4,5,6,6,9 } 0 4 4 5>=4 j>>=1; * ^ a[ix[i]]= { 1,2,3,4,5,6,6,9 } 0 2 2 3< 4 -> i+=j; j>>=1; * ^ a[ix[i]]= { 1,2,3,4,5,6,6,9 } 2 1 3 4>=4 j>>=1; * ^ a[ix[i]]= { 1,2,3,4,5,6,6,9 } 2 0 -> stop * a[ix[i]] < x -> a[ix[i+1]] >= x -> i = 2+1 = 3 in O(log(m))

so you need the index i and the binary bitmask j (degree 2). First, set i with zero and j , and the greatest power 2 is even less than n (or in this case m ). Fro try on something like this:

 i=0; for (j=1;j<=m;j<<=1;); j>>=1;

Now in each iteration test, if a[ix[i+j]] , the search condition is sufficient or not. If so, update i+=j otherwise leave it as is. After that, go to the next bit so j>>=1 , and if j==0 stop still repeat the iteration. at the end, you find the value a[ix[i]] , and index i in log2(m) iterations, which are also the number of bits needed to represent m-1 .

In the above example, I use the condition a[ix[i]]<4 , so the value found was the largest number <4 in the array. since we also needed to include 4 , then I just increment the index once at the end (I could use <=4 , but was too lazy to rewrite all this again).

The number of such elements is simply the number of elements in the array (or interval) minus i .

+1

Spektre Sep 7 '16 at 7:43

source share

This is a special version of orthogonal 2D range calculation queries. Each element el[i] converted to a point on the plane (i, el[i]) and the query (i,j,x) can be converted to count all the points in the rectangle [i,j] x [x, +infty] .

You can use 2D Range Trees (for example: http://www.cs.uu.nl/docs/vakken/ga/slides5b.pdf ) for this type of query.

A simple idea is to have a tree that stores points in the leaves (each leaf contains one point), ordered along the X axis. Each internal node of the tree contains an additional tree that stores all the points from the subtree (ordered on the Y axis). Usable space O(n logn)

A simple version can count in O(log^2 n) time, but using fractional cascading it can be reduced to O(log n) .

The best Chazelle solution in 1988 ( https://www.cs.princeton.edu/~chazelle/pubs/FunctionalDataStructures.pdf ) before O(n) preprocessing and O(log n) query time.

You may find some solutions with better query time, but they are much more complicated.

0

Tomekw Sep 09 '16 at 12:28

source share

I would try to give you a simple approach.

You must have studied merge sorting. In merge sorting, we continue to divide the array into a subarray and then collect it back, but we do not store the sorted subarrays in this approach, we store them as nodes of a binary tree.

it takes a lot of space and time to build up; now for each query you just need to find a subarray, this will be done on average in logn and, in the worst case, logn ^ 2.

These trees are also known as fenwick trees. If you need simple code, I can provide this to you.

0

Aditya upadhyay Apr 08 '18 at 14:26

source share

The previous answer describes a standalone solution using the Fenwick tree, but this problem can be solved on the Internet (and even when updating the array) with slightly worse complexity. I will describe such a solution using the segment tree and the AVL tree (any self-balancing BST can do the trick).

First, let's see how to solve this problem using the segment tree. We will do this by storing the actual elements of the array in each node over the range that it covers. Thus, for the array A = [9, 4, 5, 6, 1, 3, 2, 8] we will have:

 [9 4 5 6 1 3 2 8] Node 1 [9 4 5 6] [1 3 2 8] Node 2-3 [9 4] [5 6] [1 3] [2 8] Node 4-7 [9] [4] [5] [6] [1] [3] [2] [8] Node 8-15

Since the height of our segment tree is log(n) and at each level we store n elements, the total amount of memory used is n log(n) .

The next step is to sort these arrays, which looks like this:

 [1 2 3 4 5 6 8 9] Node 1 [4 5 6 9] [1 2 3 8] Node 2-3 [4 9] [5 6] [1 3] [2 8] Node 4-7 [9] [4] [5] [6] [1] [3] [2] [8] Node 8-15

NOTE. First you need to build a tree, and then sort it in order to preserve the order of the elements in the original array.

Now we can run our range queries, and this works basically the same as in a regular segment tree, except that when we find a completely overlapping interval, we additionally check the number of elements greater than X. This can be done using binary search in the journal. (n) time by finding the index of the first element grater and then X and subtracting it from the number of elements in this interval.

Suppose our query was (0, 5, 4) , so we search for segments on the interval [0, 5] and as a result we get arrays: [4, 5, 6, 9], [1, 3] . Then we perform a binary search on these arrays to see the number of elements greater than 4, then we get 3 (from the first array) and 0 (from the second), which ultimately gives 3 - our answer to the request.

An interval search in segment trees can have up to log(n) paths, which means log(n) arrays, and since we perform a binary search on each of them, this complicates the log^2(n) each request.

Now, if we want to update the array, since we use segment trees, it is impossible to effectively add / remove elements, but we can replace them. Using AVL trees (or other binary trees that allow replacing and searching by time log (n)) as nodes and storing arrays, we can control this operation at the same time (replacing time with log(n) ).

0

Stevo mitric May 19 '19 at 13:32

source share

Pham trung · Accepted Answer · 2016-09-07T09:05:12+0000

If we know all the requests before the operation, we can solve this problem using the Fenwick tree .

First we need to sort all the elements in the array and queries together based on their values.

So, assuming that we have an array of [5, 4, 2, 1, 3] and queries (0, 1, 6) and (2, 5, 2), after sorting we get the following result: [1, 2, 2, 3, 4, 5, 6]

Now we need to process each element in descending order:

If we meet an element from an array, we will update its index in the Fenwick tree, which will take O (log n)
If we are faced with queries, we need to check in this query range how many elements have been added to the tree that accept O (log n).

In the above example, the process will look like this:

1st element is a query for value 6, as Fenwick tree is empty -> result is 0 2nd is element 5 -> add index 0 into Fenwick tree 3rd element is 4 -> add index 1 into tree. 4th element is 3 -> add index 4 into tree. 5th element is 2 -> add index 2 into tree. 6th element is query for range (2, 5), we query the tree and get answer 2. 7th element is 1 -> add index 3 into tree. Finish.

Thus, the time complexity for our solution is O ((m + n) log (m + n)) with m, and n is the number of queries and the number of elements from the input array, respectively.

Find the number of elements greater than x in a given range

More articles: