What data structure allows you to find the number of elements in a given range in O (log n) time?

Question

What data structure allows you to find the number of elements in a given range in O (log n) time?

I solve the problem, and I realized that I need a data structure with the following properties, but I can not find it even after several hours of work on the Internet. I believe the STL library is too rich to not have this, so ask here.

Insert any element (must contain duplicate) in O(log(n)) time
Remove the item in O(log(n)) time.
If I want to request the number of elements in the range [a, b], I should get this count in O(log(n)) time ..

If I wrote it from scratch, for parts 1 and 2 I would use set or multiset , and I would change their find() method (which works in O(log(n)) time) to return indexes instead of iterators so that I could do abs(find(a)-find(b)) , so I get the number of elements in log (N) time. But unfortunately for me, find() returns an iterator.

I looked at multiset() and I could not fulfill requirement 3 in O(log(n)) time. O(n) required.

Any tips to make this easy?

+7

c ++ algorithm stl

Anoop Sep 22 '15 at 23:47

source share

5 answers

Although the standard library is really well known, I don’t think you will find anything with these specific requirements. As you noted, the selected structures return non-random access iterators - considerable complexity will arise to provide random access (or any distance function, as necessary).

You can achieve your goal by introducing an indexable skip list that provides O (log (n)) installation, removal, and indexing, as described here: https://en.wikipedia.org/wiki/Skip_list#Indexable_skiplist

Implementation guidance can be found here: http://cg.scs.carleton.ca/~morin/teaching/5408/refs/p90b.pdf

+3

Jack o'reilly Sep 23 '15 at 0:07

source share

Two obvious data structures for this task are a list of passes (which Jack O'Reilly already mentioned) and a variant of the order statistics tree (which Behzad mentions, but does not actually explain).

The order statistics tree stores one additional piece of information in each node. You can store any of several things, but the one that is easiest for me to understand is that each node stores the number of elements in its left subtree.

When you insert, when you go through the tree to store the item, you increment the counter every time you go down to the left in the tree. Since you only modify the nodes you went through anyway, this does not change the O(log N) insert. When you rebalance, you need to adjust accordingly (but, again, you only change the number of samples in the nodes that you already change during turns, so (again) you do not affect the overall complexity.

When you need to find the distance from one element to another, you simply find two nodes, each with complexity O (log N). You get the index of each element in the tree, as you find it, initializing the index from the root and then updating it from there (subtract the counter when you go down to the left, add when you go down to the right).

+3

Jerry Coffin Sep 23 '15 at 0:25

source share

You should be able to accomplish this with a standard or slightly modified B-Tree.

Typically, the standard operations are O (log (n)) for B-Tree implementations.

+1

Erix Sep 23 '15 at 0:09

source share

This, of course, is not the most efficient space giving O (3n), but it satisfies the insertion, removal, and distance criteria listed above. The following uses a linked list, map, and unordered_map.

A list is used to maintain order. Usually inserting into a list in order is linear time, but using the key map to list_iterator we can insert constant time. The advantage of storing ordered data in a list is that it uses random access iterators, which means you can get constant time std :: distance call

The map is used to get hints about where to insert a node in the list. This is done by creating a new map entry for a given key, and then decreasing the iterator by 1. The value of this new iterator gives us the corresponding insertion position in the linked list.

Unordered_map gives us o (1) a search for random access list iterators, allowing us to get the delete time (logn) and the time at a distance.

 class logn { public: using list_it = std::list<int>::iterator; void insert(int i) { const bool first = list.empty(); auto original_it = map.insert({i,list.end()}).first; // o(logn) auto hint = original_it; if (!first) --hint; umap[i] = list.insert(hint->second,i); // o(1) original_it->second = umap[i]; } void remove(int i) { auto it = umap.find(i); // o(1) list.erase(it->second); // o(1) umap.erase(it); // o(1) map.erase(map.find(i)); // o(logn) } list_it get(int i) const { return umap.find(i)->second; } unsigned int distance(int lower, int upper) const { return std::distance(get(lower), get(upper)); } private: std::list<int> list; std::unordered_map<int, list_it> umap; std::map<int, list_it> map; };

0

makar Feb 07 '16 at 22:13

source share

behzad.nouri · Accepted Answer · 2015-09-23T00:12:25+0000

Although it is not part of the STL, you can use the "Policy-based Data Structures" , which are part of the gcc extensions; In particular, you can initialize the statistics tree, as shown below. The code compiles with gcc without any external libraries:

 #include<iostream> #include<ext/pb_ds/assoc_container.hpp> #include<ext/pb_ds/tree_policy.hpp> using namespace __gnu_pbds; using namespace std; int main() { tree<int, /* key */ null_type, /* mapped */ less<int>, /* compare function */ rb_tree_tag, /* red-black tree tag */ tree_order_statistics_node_update> tr; for(int i = 0; i < 20; ++i) tr.insert(i); /* number of elements in the range [3, 10) */ cout << tr.order_of_key(10) - tr.order_of_key(3); }

What data structure allows you to find the number of elements in a given range in O (log n) time?

More articles: