Fast (est) implementation of Mutable Heap in C ++

Question

Fast (est) implementation of Mutable Heap in C ++

I'm currently looking for the fastest data structure in C ++ that matches my requirements:

I start with a few million records to insert.
At each iteration, I want to look at the maximum element and update about 10 other elements. I could only do with decreasing keys, but I would prefer an update (increase and decrease functionality).

I do not need delete / insert (except the initial one) and nothing else. I thought a bunch would be the preferred choice. After studying the STL, I found that most data structures do not support updating (which is an important part). The solution would be to remove and reinstall items that look pretty slow (my program bottleneck). Then I looked at the heaps provided by boost and found that pairing_heap gave me better results. However, all heaps are even slower than the delete / paste process on a MultiMap. Does anyone have any suggestion what other approach / implementation I could try?

Many thanks.

Again for completeness, here are my current timings:

MultiMap STL (delete / insert): ~ 70 seconds
Fibonacci Gain: ~ 110 seconds
D-Ary Heap Boost ~ (best choice: D = 150): ~ 120 seconds
Heap pairing: ~ 90 seconds
Speedy heap: 105 sec.

My post has been edited to clarify a few things:

My records are doubled (double is the key, I still have some arbitrary value associated with it), so I don't think that hashing is a good idea.
I was talking about PriorityQueue, which was wrong. Instead, the first implementation used MultiMap, where the values would be erased and then reinserted (with the new value). I updated my post. Sorry for the confusion.
I do not see how std :: make_heap can solve this problem.
To update the elements, I have a separate search table in which I save the element descriptor. With this, I can directly update an item without searching for it.

+5

c ++ performance heap boost stl

user667804 Apr 28 '15 at 23:17

source share

1 answer

Team upvote · Answer 1 · 2015-05-07T02:34:04+0000

Reduced generality

I tried my hand because it was interesting, and I think I can use such a structure myself. It is usually extremely difficult to beat professionally written standard libraries if you cannot narrow down the assumptions that limit the generality (and possibly reliability) of your solution to a lot more than them.

For example, it is extremely difficult to beat malloc and free if your goal is to simply write the memory allocator as universal and round as malloc . However, if you make a lot of assumptions for a particular use case, it's easy to write a dispenser that outperforms them just for that particular case.

Why, if you have similar questions about data structures, I offer as much information as possible about the specific details of your specific use case (what you need from the structure, such as iterating, sorting, searching, removing from the middle, inserting in the foreground, range your keys, types, etc.). Some test code (even pseudo code) of what you want to do is useful. We want as many of these narrowing assumptions as possible.

Sub-Par Search Algorithms for Highly Dynamic Content

In your case, we have a particularly dynamic use of such a large heap structure. I come from an old school gaming background and in such very dynamic cases, often the parallel search algorithm works better than the one that has excellent algorithmic complexity, since this search superiority often comes at the price of a more expensive build / update process (slower inserts / removal).

For example, if you have many sprites moving over each frame in a 2D game, a uniformly roughly written algorithmically inferior fixed mesh accelerator for finding the nearest neighbor may work better in practice than an algorithmically superior, professionally written ATV, since the cost is constantly moving things and rebalancing the tree can lead to overheads that outweigh the excellent acceleration and theoretical logarithmic complexity of everything. A fixed grid has pathological cases where all sprites are grouped in one area, but this rarely happens.

So, I took such a strategy. I make some assumptions, the biggest of which is that your keys are reasonably distributed over a range. Another is that you can approximate the maximum number of elements that the container should handle (although you can go under this number, but it works best with some crude knowledge and expectation). And I didn’t bother to give iterators iteration through the container (this is possible if you want, but the keys will not be perfectly sorted, as with std::multimap , they will sort differently), but it offers deletion from the middle in addition to the appearance of the element with a minimum key. In my solution, there is a pathological case that does not exist, say std::multimap , where if you have many elements, where the keys have approximately the same value (for example: 0.000001, 0.00000012, 0.000000011, etc. e.) For all millions of elements, it degrades into a linear search across all elements and performs much worse than multimap .

But I have a solution that is about 8 times faster than std::multmap if my assumptions match your use case.

Note: this is fast code and written with a lot of fast and dirty profilers with micro-optimization, even providing a pool allocator and manipulating things at the bit and byte level using alignment assumptions (using maximum alignment, assuming that “portable enough”). He is also not worried about things like exception safety. However, it should be safe to use for C ++ objects.

As a test case, I created a million random keys and the minimum keys began to appear, changing them and reinserting them. I did this both for multiplexes and for my structure to compare performance.

Balanced Distributed Heap / Priority Queue (Kinda)

 #include <iostream> #include <cassert> #include <utility> #include <stdexcept> #include <algorithm> #include <cmath> #include <ctime> #include <map> #include <vector> #include <malloc.h> // Max Alignment #if defined(_MSC_VER) #define MAX_ALIGN __declspec(align(16)) #else #define MAX_ALIGN __attribute__((aligned(16))) #endif using namespace std; static void* max_malloc(size_t amount) { #ifdef _MSC_VER return _aligned_malloc(amount, 16); #else void* mem = 0; posix_memalign(&mem, 16, amount); return mem; #endif } static void max_free(void* mem) { #ifdef _MSC_VER return _aligned_free(mem); #else free(mem); #endif } // Balanced priority queue for very quick insertions and // removals when the keys are balanced across a distributed range. template <class Key, class Value, class KeyToIndex> class BalancedQueue { public: enum {zone_len = 256}; /// Creates a queue with 'n' buckets. explicit BalancedQueue(int n): num_nodes(0), num_buckets(n+1), min_bucket(n+1), buckets(static_cast<Bucket*>(max_malloc((n+1) * sizeof(Bucket)))), free_nodes(0), pools(0) { const int num_zones = num_buckets / zone_len + 1; zone_counts = new int[num_zones]; for (int j=0; j < num_zones; ++j) zone_counts[j] = 0; for (int j=0; j < num_buckets; ++j) { buckets[j].num = 0; buckets[j].head = 0; } } /// Destroys the queue. ~BalancedQueue() { clear(); max_free(buckets); while (pools) { Pool* to_free = pools; pools = pools->next; max_free(to_free); } delete[] zone_counts; } /// Makes the queue empty. void clear() { const int num_zones = num_buckets / zone_len + 1; for (int j=0; j < num_zones; ++j) zone_counts[j] = 0; for (int j=0; j < num_buckets; ++j) { while (buckets[j].head) { Node* to_free = buckets[j].head; buckets[j].head = buckets[j].head->next; node_free(to_free); } buckets[j].num = 0; } num_nodes = 0; min_bucket = num_buckets+1; } /// Pushes an element to the queue. void push(const Key& key, const Value& value) { const int index = KeyToIndex()(key); assert(index >= 0 && index < num_buckets && "Key is out of range!"); Node* new_node = node_alloc(); new (&new_node->key) Key(key); new (&new_node->value) Value(value); new_node->next = buckets[index].head; buckets[index].head = new_node; assert(new_node->key == key && new_node->value == value); ++num_nodes; ++buckets[index].num; ++zone_counts[index/zone_len]; min_bucket = std::min(min_bucket, index); } /// @return size() == 0. bool empty() const { return num_nodes == 0; } /// @return The number of elements in the queue. int size() const { return num_nodes; } /// Pops the element with the minimum key from the queue. std::pair<Key, Value> pop() { assert(!empty() && "Queue is empty!"); for (int j=min_bucket; j < num_buckets; ++j) { if (buckets[j].head) { Node* node = buckets[j].head; Node* prev_node = node; Node* min_node = node; Node* prev_min_node = 0; const Key* min_key = &min_node->key; const Value* min_val = &min_node->value; for (node = node->next; node; prev_node = node, node = node->next) { if (node->key < *min_key) { prev_min_node = prev_node; min_node = node; min_key = &min_node->key; min_val = &min_node->value; } } std::pair<Key, Value> kv(*min_key, *min_val); if (min_node == buckets[j].head) buckets[j].head = buckets[j].head->next; else { assert(prev_min_node); prev_min_node->next = min_node->next; } removed_node(j); node_free(min_node); return kv; } } throw std::runtime_error("Trying to pop from an empty queue."); } /// Erases an element from the middle of the queue. /// @return True if the element was found and removed. bool erase(const Key& key, const Value& value) { assert(!empty() && "Queue is empty!"); const int index = KeyToIndex()(key); if (buckets[index].head) { Node* node = buckets[index].head; if (node_key(node) == key && node_val(node) == value) { buckets[index].head = buckets[index].head->next; removed_node(index); node_free(node); return true; } Node* prev_node = node; for (node = node->next; node; prev_node = node, node = node->next) { if (node_key(node) == key && node_val(node) == value) { prev_node->next = node->next; removed_node(index); node_free(node); return true; } } } return false; } private: // Didn't bother to make it copyable -- left as an exercise. BalancedQueue(const BalancedQueue&); BalancedQueue& operator=(const BalancedQueue&); struct Node { Key key; Value value; Node* next; }; struct Bucket { int num; Node* head; }; struct Pool { Pool* next; MAX_ALIGN char buf[1]; }; Node* node_alloc() { if (free_nodes) { Node* node = free_nodes; free_nodes = free_nodes->next; return node; } const int pool_size = std::max(4096, static_cast<int>(sizeof(Node))); Pool* new_pool = static_cast<Pool*>(max_malloc(sizeof(Pool) + pool_size - 1)); new_pool->next = pools; pools = new_pool; // Push the new pool nodes to the free stack. for (int j=0; j < pool_size; j += sizeof(Node)) { Node* node = reinterpret_cast<Node*>(new_pool->buf + j); node->next = free_nodes; free_nodes = node; } return node_alloc(); } void node_free(Node* node) { // Destroy the key and value and push the node back to the free stack. node->key.~Key(); node->value.~Value(); node->next = free_nodes; free_nodes = node; } void removed_node(int bucket_index) { --num_nodes; --zone_counts[bucket_index/zone_len]; if (--buckets[bucket_index].num == 0 && bucket_index == min_bucket) { // If the bucket became empty, search for next occupied minimum zone. const int num_zones = num_buckets / zone_len + 1; for (int j=bucket_index/zone_len; j < num_zones; ++j) { if (zone_counts[j] > 0) { for (min_bucket=j*zone_len; min_bucket < num_buckets && buckets[min_bucket].num == 0; ++min_bucket) {} assert(min_bucket/zone_len == j); return; } } min_bucket = num_buckets+1; assert(empty()); } } int* zone_counts; int num_nodes; int num_buckets; int min_bucket; Bucket* buckets; Node* free_nodes; Pool* pools; }; /// Test Parameters enum {num_keys = 1000000}; enum {buckets = 100000}; static double sys_time() { return static_cast<double>(clock()) / CLOCKS_PER_SEC; } struct KeyToIndex { int operator()(double val) const { return static_cast<int>(val * buckets); } }; int main() { vector<double> keys(num_keys); for (int j=0; j < num_keys; ++j) keys[j] = static_cast<double>(rand()) / RAND_MAX; for (int k=0; k < 5; ++k) { // Multimap { const double start_time = sys_time(); multimap<double, int> q; for (int j=0; j < num_keys; ++j) q.insert(make_pair(keys[j], j)); // Pop each key, modify it, and reinsert. for (int j=0; j < num_keys; ++j) { pair<double, int> top = *q.begin(); q.erase(q.begin()); top.first = static_cast<double>(rand()) / RAND_MAX; q.insert(top); } cout << (sys_time() - start_time) << " secs for multimap" << endl; } // Balanced Queue { const double start_time = sys_time(); BalancedQueue<double, int, KeyToIndex> q(buckets); for (int j=0; j < num_keys; ++j) q.push(keys[j], j); // Pop each key, modify it, and reinsert. for (int j=0; j < num_keys; ++j) { pair<double, int> top = q.pop(); top.first = static_cast<double>(rand()) / RAND_MAX; q.push(top.first, top.second); } cout << (sys_time() - start_time) << " secs for BalancedQueue" << endl; } cout << endl; } }

Results on my machine:

 3.023 secs for multimap 0.34 secs for BalancedQueue 2.807 secs for multimap 0.351 secs for BalancedQueue 2.771 secs for multimap 0.337 secs for BalancedQueue 2.752 secs for multimap 0.338 secs for BalancedQueue 2.742 secs for multimap 0.334 secs for BalancedQueue

Fast (est) implementation of Mutable Heap in C ++

Reduced generality

Sub-Par Search Algorithms for Highly Dynamic Content

Balanced Distributed Heap / Priority Queue (Kinda)

More articles: