When do I need to select an RB tree, a B-Tree, or AVL?

As a programmer, when should you consider using the RB tree, B tree, or AVL tree? What are the key points to consider before deciding on a choice?

Can someone explain the scenario for each tree structure, why is it selected above others with reference to key points?

+83
data-structures tree b-tree avl-tree red-black-tree
Oct. 19 '09 at 15:58
source share
4 answers

Take this with a pinch of salt:

B-tree, when you control over thousands of items and you upload them from disk or onto some kind of slow media.

RB tree, when you do quite frequent insertions, deletions and searches on the tree.

The AVL tree when your insertions and deletions are infrequent with respect to your searches.

+106
Oct 19 '09 at 16:02
source share

I think B + trees are a good data structure for an ordered general purpose container, even in main memory. Even when virtual memory is not a problem, the need for caching often arises, and the B + trees are especially good for sequential access - the same asymptotic performance as a linked list, but with the convenience of caching close to a simple array. All this and O (log n) search, insert and delete.

There are problems in B + trees β€” for example, elements moving inside nodes when you insert / delete, invalid pointers to these elements. I have a container library that does "cursor maintenance" - cursors join the node sheet that they currently reference in a linked list, so they can be fixed or canceled automatically. Since rarely there are more than one or two cursors, it works well - but it's an extra bit of work anyway.

Another thing is that the B + tree is essentially just that. I assume that you can disable or recreate non-leaf nodes depending on whether you need them or not, but with binary tree nodes you get much more flexibility. The binary tree can be converted to a linked list and vice versa without copying the nodes - you just change the pointers, and then remember that you now consider it as a different data structure. Among other things, this means that you easily get O (n) merging trees - convert both trees to lists, combine them, and then convert them back to a tree.

Another thing is allocating and freeing memory. In the binary tree, this can be separated from the algorithms - the user can create a node, then invoke the insert algorithm, and you can remove nodes (remove them from the tree, but not free up memory). In a B-tree or B + -tree this obviously does not work - the data will live in a multi-position element node. Writing insertion methods that β€œplan” an operation without changing nodes until they know how many new nodes are needed and what they can be allocated is a problem.

Red black vs AVL? I am not sure that this is of great importance. My own library has a policy-based β€œtool” class for managing nodes, with methods for doubly linked lists, simple binary trees, markup trees, red-black trees, and skins, including various transformations. Some of these methods were implemented only because I was bored at one time or another. I'm not sure I even tested the treap methods. The reason I chose red-black trees, rather than AVL, is because I personally understand the algorithms better - this does not mean that they are simpler, it is just an accident of the story with which I am familiar with them.

Last: I only originally designed my B + tree containers as an experiment. This is one of those experiments that never actually ended, but this is not something I would encourage others to repeat. If all you need is an ordered container, the best answer is to use the one your existing library provides. std :: map, etc. in C ++. My library developed over the years, and it took quite a while to get it stable, and I recently recently discovered it is technically not portable (depending on the W90T bit).

+19
Nov 04 '09 at 23:19
source share

In memory, B-Tree has the advantage of having more than 32,000 elements ... See speedtest.pdf from stx-btree .

+4
Jan 08 '14 at 0:20
source share

When choosing data structures, you trade factors such as

  • extraction speed v update rate
  • how well the structure handles the worst operations, for example, inserting records that arrive in sorted order
  • gaps in spending

I would start by reading the Wikipedia articles referenced by Robert Harvey.

Pragmatically, when working in languages ​​such as Java, the average programmer tends to use the collection classes provided. If the performance tuning process reveals that collection performance is problematic, you can look for alternative implementations. This is rarely the first thing to consider when developing a business. It is extremely rare that you need to implement such data structures manually; libraries that can be used are usually used.

0
Oct 19 '09 at 16:13
source share



All Articles