How to adapt the Fenwick tree to respond to minimal range requests

Fenwick tree is a data structure that provides an effective response to basic queries:

  • add an element to a specific index of the array update(index, value)
  • find the sum of elements from 1 to N find(n)

both operations are performed in O(log(n)) time, and I understand the logic and implementation. It is easy to implement many other operations, such as finding the sum from N to M.

I wanted to understand how to adapt the Fenwick tree for RMQ. Obviously, to change the Fenwick tree for the first two operations. But I can’t figure out how to find a minimum in the range from N to M.

After finding solutions, most people think that this is impossible, and a small minority claims that this can really be done ( approach1 , approach2 ).

The first approach (written in Russian, based on my google translation, has 0 explanations and only two functions) relies on three arrays (initial, left and right) when my tests did not work correctly for all possible test cases.

The second approach requires only one array and is based on applications executed in O(log^2(n)) , and also does not explain why and how it should work. I did not try to test it.


In the light of conflicting statements, I wanted to find out if the Fenwick tree can be expanded to respond with update(index, value) and findMin(from, to) .

If possible, I would be happy to hear how it works.

+7
algorithm fenwick-tree
source share
2 answers

Yes, you can adapt Fenwick trees (binary indexed trees) to

  • Update value at given index in O (log n)
  • Minimum query value for a range in O (log n) (amortized)

We need 2 Fenwick trees and an additional array containing real values ​​for the nodes.

Suppose we have the following array:

 index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 value 1 0 2 1 1 3 0 4 2 5 2 2 3 1 0 

The wave of the magic wand and the following trees appear:

Fenwick trees for an example problem

Note that in both trees, each node represents the minimum value for all nodes inside this subtree. For example, in BIT2, node 12 has a value of 0, which is the minimum value for nodes 12,13,14,15.

Inquiries

We can efficiently query the minimum value for any range by calculating a minimum of several subtree values ​​and another real node value. For example, the minimum value for the range [2,7] can be determined by accepting the minimum value BIT2_Node2 (representing nodes 2,3) and BIT1_Node7 (representing nodes 7), BIT1_Node6 (representing nodes 5,6) and REAL_4 - therefore it covers all nodes in [ 2.7]. But how do we know which trees we want to look under?

 Query(int a, int b) { int val = infinity // always holds the known min value for our range // Start traversing the first tree, BIT1, from the beginning of range, a int i = a while (parentOf(i, BIT1) <= b) { val = min(val, BIT2[i]) // Note: traversing BIT1, yet looking up values in BIT2 i = parentOf(i, BIT1) } // Start traversing the second tree, BIT2, from the end of range, b i = b while (parentOf(i, BIT2) >= a) { val = min(val, BIT1[i]) // Note: traversing BIT2, yet looking up values in BIT1 i = parentOf(i, BIT2) } val = min(val, REAL[i]) // Explained below return val } 

It can be mathematically proved that both workarounds will end in the same node. This node is part of our range, but it is not part of any subtrees we were looking at. Imagine a case where the (unique) smallest value of our range is in this special node. If we were not looking, our algorithm would give incorrect results. This is why we should do this in a real array of values.

To understand the algorithm, I suggest you simulate it with a pen and paper, looking at the data in the examples above. For example, a range query [4.14] will return a minimum of BIT2_4 (number 4,5,6,7), BIT1_14 (representative 13,14), BIT1_12 (rep. 9,10,11, 12) and REAL_8, therefore, cover all possible values ​​[4, 14].

Update

Since a node is the minimum value of itself and its children, changing the node will affect its parents, but not its children. Therefore, to update the tree, we start with node, we modify and move the entire path to the fictitious root of the node (0 or N + 1, depending on which tree).

Suppose we update some node in some tree:

  • If the new value <the old value, we will always overwrite the value and move up
  • If the new value == the old value, we can stop because there will be no more changes cascading up
  • If the new value> the old value, everything becomes interesting.

    • If the old value still exists somewhere inside this subtree, we do
    • If not, we must find a new minimum value between the real [node] and each tree [child_of_node], change the tree [node] and go up

Pseudocode for updating node with value v in the tree:

 while (node <= n+1) { if (v > tree[node]) { if (oldValue == tree[node]) { v = min(v, real[node]) for-each child { v = min(v, tree[child]) } } else break } if (v == tree[node]) break tree[node] = v node = parentOf(node, tree) } 

Note that oldValue is the original value that we replaced, while v can be reassigned several times as we move through the tree.

Binary indexing

In my experiments, the Minimum range queries were about twice as fast as the Segment Tree implementation, and the updates were slightly faster. The main reason for this is the use of super-efficient bitwise operations to move between nodes. They are very well described here . Segment trees are really easy to code, so think about how beneficial this performance advantage is. The update method of my Fenwick RMQ is 40 lines and it took some time to debug. If someone wants my code, I can put it on github. I also created tape and test generators to make sure everything works.

I helped to understand this topic and implement it from the community of Finnish algorithms. The source of the image is http://ioinformatics.org/oi/pdf/v9_2015_39_44.pdf , but they attribute the 1994 Fenwick document to it.

+8
source share

The Fenwick tree structure works for addition because the addition is reversible. This does not work at least, because as soon as you have a cell that must have at least two or more inputs, you have lost information potentially.

If you want to double your storage requirements, you can support RMQ with a segment tree that is built implicitly like a binary heap. For RMQ with n values, store n values ​​in the [n, 2n) places of the array. Locations [1, n) are aggregates with the formula A (k) = min (A (2k), A (2k + 1)). Place 2n is an endless guardian. The update procedure should look something like this.

 def update(n, a, i, x): # value[i] = x i += n a[i] = x # update the aggregates while i > 1: i //= 2 a[i] = min(a[2*i], a[2*i+1]) 

Here, multiplication and division can be replaced by shifts for efficiency.

The RMQ pseudocode is more delicate. Here is another unverified and non-optimized procedure.

 def rmq(n, a, i, j): # min(value[i:j]) i += n j += n x = inf while i < j: if i%2 == 0: i //= 2 else: x = min(x, a[i]) i = i//2 + 1 if j%2 == 0: j //= 2 else: x = min(x, a[j-1]) j //= 2 return x 
0
source share

All Articles