Range request for a semigroup operator (union)

I am looking to implement an algorithm that is given an array of integers and a list of ranges (intervals) in this array, returns the number of individual elements in each interval. That is, if array A and range [i, j] return the size of the set {A [i], A [i + 1], ..., A [j]}.

Obviously, the naive approach (iterating from i to j and counting ignoring duplicates) is too slow. Range-Sum does not seem to apply, since AUB-B is not always equal to B.

I looked at Range Queries on Wikipedia, and he hints that Yao (in '82) showed an algorithm that does this for semigroup operators (which combine like) with linear preprocessing time and space and an almost constant query time. Unfortunately, the article unavailable freely.

Edit: this exact problem seems to be available at http://www.spoj.com/problems/DQUERY/

+4
source share
3 answers

There is a fairly simple algorithm that uses O (N log N) time and space for preprocessing and O (log N) time for each request. First, create a tree of constant segments to answer the query for the sum of the range (it should initially contain zeros in all positions). Then we sort through all the elements of this array and save the last position of each number. At each iteration, create a new version of the tree of constant segments by placing 1 in the last position of each element (at each iteration, the position of only one element can be updated, therefore only one position value in the segment tree is changed, therefore, the update can be performed in O (log N) ) To answer the query (l, r), you just need to find the sum on the segment (l, r) for the version of the tree that was created during iteration through the r element of the original array. Hope this algorithm is fast enough. Upd. There is a small error in my explanation: at each step the values โ€‹โ€‹of no more than two positions in the segment tree can change (since it is necessary to update it to the last last position of the number). However, this does not change the complexity.

+3
source

You can answer any of your queries in constant time by performing a preliminary calculation of the quadratic time :

For every i from 0 to n-1 S <- new empty set backed by hashtable; C <- 0; For every j from i to n-1 If A[j] does not belong to S, increment C and add A[j] to S. Stock C as the answer for the query associated to interval i..j. 

This algorithm takes quadratic time, since for each interval we perform a limited number of operations, each of which takes constant time (note that the set S is based on a hash table), and there is a quadratic number of intervals.

If you do not have additional information about the queries (total number of queries, distribution of intervals), you cannot do much better, since the total number of intervals is already quadratic.

You can exchange quadratic pre-compression for n linear calculations on the fly: after receiving a request of the form A [i..j] to the pre-path (in O(n) time) the answer is for all intervals A[i..k] , k>=i . This ensures that the amortized complexity remains quadratic and you will not be forced to perform a full quadratic preliminary calculation at the beginning.

Note that the obvious algorithm (the one that you call the obvious in the statement) is cubic, since you are completely scanning every interval.

0
source

Here is another approach that can be very closely related to the segment tree. Think of the elements of an array as leaves of a complete binary tree. If there are 2 ^ n elements in the array, then there are n levels of this full tree. On each internal node of the tree, a union of points lying in the leaves underneath is stored. Each number in the array should appear once at each level (less if there are duplicates). Thus, the cost in space is a factor of log n.

Consider the range A..B of length K. You can develop a union of points in this range by forming a union of sets associated with leaves and nodes, selecting nodes as high as possible on the tree, if possible. the subtree under these nodes is completely contained in the range. If you approach the selection range of subtrees that are as large as possible, you will find that the subtree size first increases and then decreases, and the number of required subtrees grows only with the range size logarithm - at the beginning if you could take only a subtree of size 2 ^ k , it will end on a border divisible by 2 ^ (k + 1), and you will have the option of a subtree of at least 2 ^ (k + 1) as the next if your range is large enough.

Thus, the number of semigroup operations necessary to respond to a query is O (log n), but note that semigroup operations can be expensive since you can form a union of two large sets.

0
source

All Articles