Red and black trees

I have seen binary trees and binary searches mentioned in several books that I read recently, but since I'm still at the beginning of my research in computer science, I still have to go through a class that really looked at algorithms and data structures.

I looked through typical sources (Wikipedia, Google), and most descriptions of the usefulness and implementation (in particular) of red-black trees became just as dense and difficult to understand. I am sure for someone with the necessary background, this makes sense, but at the moment it reads almost like a foreign language.

So, what makes binary trees useful in some common tasks that you do during programming? In addition, which trees do you prefer to use (please include an example implementation) and why?

+51
algorithm binary-tree red-black-tree
Aug 21 '08 at 18:37
source share
13 answers

Red Black trees are good for creating well-balanced trees. The main problem with binary search trees is that you can make them unbalanced very easily. Imagine that your first number is 15. Then all the numbers after that become less and less than 15. You will have a tree that is very heavy on the left side and has nothing on the right side.

Red Black trees solve this by causing your tree to be balanced whenever you insert or remove. This is achieved by a series of rotations between ancestor nodes and child nodes. The algorithm is actually quite simple, although it is a bit long. I would suggest compiling the CLRS tutorial (Cormen, Lieserson, Rivest and Stein), An Introduction to Algorithms, and RB Tree Reading.

The implementation is also not so short, so you probably shouldn't include it here. However, trees are widely used for high-performance applications that need access to a lot of data. They provide a very efficient way to search for nodes with relatively little insertion / deletion overhead. Again, I would suggest looking at the CLRS to find out how they are used.

While BSTs cannot be used explicitly - one example of using trees in general - in almost all modern DBMSs. Likewise, your file system is almost certainly represented as some kind of tree structure, and files are also indexed in this way. Google Power Trees. Trees operate on almost every site on the Internet.

+52
Aug 21 '08 at 18:56
source share

I would only like to touch upon the question "So, what makes binary trees useful in some common tasks that you do during programming?"

This is a big topic that many people disagree with. Some say that algorithms taught in the CS degree, such as binary search trees and directional graphs, are not used in everyday programming and therefore are irrelevant. Others do not agree that these algorithms and data structures are the basis for all our programs, and it is important to understand them, even if you never have to write one for yourself. This filters out conversations about good job interviews and hiring practices. For example, Steve Yegge contains an article about an interview on Google that addresses this issue. Remember this debate; experienced people do not agree.

In typical business programming, you may not need to create binary trees, or even trees very often. However, you will use many classes that work internally with trees. Many of the core organization classes in each language use trees and hashes to store and access data.

If you are involved in higher-performing endeavors or situations that are somewhat outside the scope of business programming, you will see that trees become a direct friend. As another poster said, trees are the main data structures for databases and indexes of all kinds. They are useful for data mining and visualization, advanced graphics (2d and 3d) and many other computing tasks.

I used binary trees as BSP (binary space partitioning) trees in 3d graphics. I am currently browsing trees to sort large volumes of geocoded data and other data to visualize information in Flash / Flex applications. Whenever you push the hardware boundary or want to work on lower hardware specifications, understanding and choosing the best algorithm can make the difference between failure and success.

+17
Aug 21 '08 at 19:27
source share

None of the answers say what BST is good for.

If what you want to do is just search by values, then the hash table is much faster, O (1) insert and search (best suited for depreciation).

A BST will look for O (log N), where N is the number of nodes in the tree, inserts are also O (log N).

RB and AVL trees are important, like the other answer mentioned because of this property, if a simple BST is created with the values ​​in order, then the tree will be as high as the number of inserted values, this is bad for search performance.

The difference between the RB and AVL trees is the rotations needed to rebalance after insertion or deletion, the AVL trees are O (log N) for rebalances, while the RB trees are O (1). An example of the usefulness of this constant complexity is the case where you can store a constant data source, if you need to track changes for rollback, you will have to track possible O (log N) changes using the AVL tree.

Why are you willing to pay for the cost of a tree for a hash table? ORDER! Hash tables are not ordered; BSTs, on the other hand, are always ordered naturally due to their structure. Therefore, if you find that you are throwing a bunch of data in an array or other container, and then sorting it later, BST might be the best solution.

The order tree property gives you the number of ordered iterative possibilities, in order, by depth, by width, by reservation, by order. These iterative algorithms are useful in different circumstances if you want to see them.

Red black trees are used internally in almost every ordered container of language libraries, C ++ Set and Map, .NET SortedDictionary, Java TreeSet, etc.

Thus, trees are very useful, and you can use them quite often without even knowing it. You most likely will never need to write it yourself, although I would highly recommend it as an interesting programming exercise.

+8
Dec 02 '09 at 6:46
source share

Red Black Trees and B-trees are used in all kinds of persistent vaults; because trees are balanced, latitude and workaround performance is reduced.

Almost all modern database systems use trees to store data.

+4
Aug 21 '08 at 19:09
source share

BST make the world go round, as Mikel said. If you are looking for a good tree to implement, look at AVL trees (Wikipedia). They have a balancing condition, so they are guaranteed to be O (logn). This search efficiency allows you to logically introduce any indexing processes. The only thing that would be more efficient would be a hash function, but they become ugly fast, fast and in a hurry. In addition, you are faced with a birthday paradox (also known as a hole problem).

Which textbook are you using? We used Data Structures and Analysis in Java by Mark Allen Weiss. I really open it on my lap when I type this. It has an excellent section on Red-Black trees and even contains the code needed to implement all the trees that it talks about.

+2
Aug 21 '08 at 19:12
source share

The best description of the red-black trees I've seen is that of Cormen, Leissern and Rivest's Introduction to Algorithms. I could even figure it out enough to partially implement one (insert only). There are also many applets, such as This One, on various web pages that enliven the process and allow you to look and see a graphical representation of the tree structure algorithm.

+2
Sep 18 '08 at 16:40
source share

Red-black trees remain balanced, so you do not need to go deep to get objects. Saved time creates RB O (log () n)) trees in the case of WORST, while failed binary trees can fall into a one-way configuration and cause errors in O (n) in the bad case. This happens in practice or on random data. Therefore, if you need a critical time code (database retrieval, network server, etc.), you use RB trees to support ordered or unordered lists / sets.

But RBTrees for noobs! If you are doing AI and you need to do a search, you will find that you are unlocking status information. You can use constant red and black to create new states in O (log (n)). Permanent red ebony stores a copy of the tree before and after the morphological operation (insert / delete), but without copying the entire tree (usually O (log (n))). I have open source red-black tree for java

http://edinburghhacklab.com/2011/07/a-java-implementation-of-persistent-red-black-trees-open-sourced/

+2
Jul 25 '11 at 20:17
source share

There is a lot and a lot of warmth, but not so much light, so let's see if we can provide some of them.

First , the RB tree is an associative data structure, unlike, say, an array that cannot take a key and return the associated value, well, if only an integer "key" in 0% sparse index of adjacent integers. The array cannot grow in size (yes, I also know about realloc (), but under covers that require a new array, and then memcpy ()), so if you have one of these requirements, the array will not. The array memory efficiency is perfect. Zero waste, but not very smart or flexible - realloc () does not stand up.

The second , unlike bsearch () for an array of elements, which is an associative data structure, the RB tree can grow (and shrink) in size dynamically. The bsearch () function works great for indexing a data structure with a known size that remains that size. Therefore, if you do not know the size of your data in advance or if new elements should be added or removed, bsearch () is absent. Bsearch () and qsort () are well supported in classic C and have good memory efficiency, but are not dynamic enough for many applications. This is my personal favorite, because they are fast, light, and if you are not dealing with real-time applications, quite often quite flexible. In addition, in C / C ++ you can sort the array of pointers to data records by pointing to the struc {} element, for example, you want to compare and then rearrange the pointer in the array of pointers so that reading the pointers in order at the end of the pointer sort output your data in sorted order. Using this with memory mapped data files is extremely economical, fast and easy. All you have to do is add a few "*" to your / s comparison function.

Third , unlike a hash table, which should also be a fixed size and cannot be increased after filling, the RB tree will automatically grow and balance to maintain its O (log (n)) guarantee of operation. Especially if the RB tree key is int, it can be faster than the hash, because although the complexity of the hash table is O (1), this 1 can be a very expensive hash calculation. A multiple single-entry integer with a tree often exceeds 100-hour + hash calculations, not to mention reuse, and malloc () is the space for hash collisions and repetitions. Finally, if you want access to ISAM, as well as a key to access your data, a hash is excluded, since the ordering of data inherent in the hash table is not performed, unlike the natural ordering of data in any tree implementation. The classic use of a hash table is to provide access to the reserved word table for the compiler. The memory efficiency is excellent.

The fourth and very low in any list is a linked or doubly linked list, which, unlike an array, naturally supports inserting and deleting elements, and, as it follows, resizing, This is the slowest of all data structures, since each element knows how to go to the next element, so you need to look on average for links (element_knt / 2) to find your database. It is mainly used where inserts and deletions somewhere in the middle of the list are common, and especially when the list is round and loads an expensive process that makes time to read links relatively small. My shared RX should use an arbitrarily large array instead of a linked list, if your only requirement is to increase its size. If you finish the size with an array, you can realloc () to increase the array. STL does this for you under the covers when you use a vector. Rough, but potentially 1000 times faster if you don’t need inserts, deletions or key searches. Poor memory quality, especially for doubly linked lists. In fact, a doubly linked list that requires two pointers, exactly the same as a memory, is inefficient like a red-black tree, having NO of its attractive fast, ordered search characteristics.

Fifth , trees support many more operations on their sorted data than any other data structure. For example, many database queries exploit the fact that the range of leaf values ​​can be easily specified by specifying their common parent, and then focusing the subsequent processing on the part of the tree that the parent "owns". The potential for multithreading offered by this approach should be obvious, since only a small area of ​​the tree needs to be blocked, namely, only the nodes owned by the parent and the parent itself.

In short, trees are Cadillac data structures. You pay a high price in terms of memory usage, but you get a completely self-preserving data structure. That's why, as pointed out in other answers, transaction databases use trees almost exclusively.

+2
03 Sep '13 at 21:19
source share

Since you are asking which tree people use, you need to know that the Red Black tree is basically a 2-3-4 B-tree (for example, a B-tree of order 4). A B-tree is not equivalent to a binary tree (as asked in your question).

Here's a great resource describing the initial abstraction known as the symmetric binary B-tree, which later turned into RBTree. You will need a good understanding on B-trees before this makes sense. To summarize: a “red” link in a Red Black tree is a way of representing nodes that are part of a B-tree node (values ​​within a key range), while “black” links are nodes that are connected vertically in a B-tree.

So, here is what you get when you translate the rules of the red black tree from the point of view of the B-tree (I use the format Red Black tree rule => B Tree equivalent):

1) A node is either red or black. => A node in a b-tree can be either part of a node, or as a node at a new level.

2) The root is black. (This rule is sometimes omitted because it does not affect the analysis) => The root of the node can be considered as part of the internal root of the node as the child of the imaginary parent of the node.

3) All leaves (NIL) are black. (All sheets have the same color as the root.) => Since one way to represent the RB tree is to omit the leaves, we can eliminate this.

4) Both children from each red node are black. => Children of an internal node in a B-tree always lie at a different level.

5) Each simple path from a given node to any of its descendants leaves the same number of black nodes. => The B-tree is maintained balanced, since it requires that all leaf nodes be at the same depth (therefore, the height of the B-tree node is represented by the number of black links from root to leaf red Ebony)

In addition, there is a simpler “non-standard” implementation of Robert Sedgwick here : (He is the author of the book “Algorithms with Wayne”)

+1
Jan 17 '13 at 13:19
source share

If you want to see how the Red-Black tree should look graphically, I encoded the implementation of the Red-Black tree, which you can download here

0
Dec 08 '08 at 16:06
source share

IME, almost no one understands the RB tree algorithm. People may repeat the rules to you, but they do not understand why these rules and where they come from. I'm not an exception: -)

For this reason, I prefer the AVL algorithm because it is easy to understand. Once you understand this, you can encode it from scratch because it makes sense to you.

0
Mar 24 '09 at 21:04
source share

Trees can be fast. If you have a million nodes in a balanced binary tree, an average of twenty comparisons is required to find one item. If you have a million nodes in a linked list, an average of five hundred thousand comparisons is required to search for the same item.

If the tree is not balanced, it can be as slow as the list, and also store more memory to store. Imagine a tree in which most nodes have the right child, but do not leave the child; this is a list, but you still need to save the memory space to put it to the left node if it appears.

In any case, the AVL tree was the first balanced binary tree algorithm, and the Wikipedia article on it is pretty clear. Honestly, the Wikipedia article on red-black trees is as clear as dirt.

In addition to binary trees, B-trees are trees where each node can have many values. B-Tree is not a binary tree, it's just a name. They are really useful for efficient use of memory; each node of the tree can be designed so that it enters into one block of memory, so that you are not slow (and slow) and find in memory many different things that were unloaded to disk. Here is a phenomenal example of B-Tree .

0
Jul 09 '10 at 19:57
source share

If you want to take a look at my implementation of Red Black Tree. http://code.google.com/p/cstl/source/browse/src/c_rb.c

0
Apr 12 2018-11-11T00:
source share



All Articles