Balanced Binary Trees vs. Indexed Skiers

I'm not sure that the question should be here either by programmers (or some other SE site), but I was interested to learn about the corresponding differences between balanced binary trees and indexed skipists . A problem arose in the context of this issue . From Wikipedia:

Skip lists are a probabilistic data structure that probably crowds out balanced trees as an implementation method for many applications. Skipped list algorithms have the same asymptotic expected time frames as balanced trees, and are simpler, faster, and use less space.

Do the spatial requirements of a skipist correspond to the depth of the hierarchy? And aren't binary trees easier to use, at least for search (provisioning, inserting, and deleting in balanced BSTs can be tricky)? Are there other advantages / disadvantages for skipists?

+4
source share
4 answers

(Some parts of your question (ease of use, simplicity, etc.) are a bit subjective, and I will answer them at the end of this post.)

Look at the use of space. First, suppose you have a binary search tree with n nodes. What is the common use of space? Well, each node stores some data plus two pointers. You may also need some information to maintain balance information. This means that the overall use of space

n * (2 * sizeof (pointer) + sizeof (data) + sizeof (balance information))

So think of an equivalent skipist. You are absolutely right that the real amount of memory used by a skipist depends on the height of the nodes, but we can talk about the expected amount of space used by a skipist. As a rule, you select the node height in the skiplist, starting at 1, and then repeatedly flipping the fair coin, increasing the height while you turn the heads and stop as soon as you turn the tails. Given this setting, what is the expected number of pointers inside a skipist?

An interesting result of probability theory is that if you have a series of independent events with probability p, you need approximately 1 / p samples (in anticipation) before this event occurs. In our coin flipping example, we flip the coin until it appears on the tail, and since the coin is an honest coin (a head appears with a 50% probability), the expected number of tests required before we turn the tails, is 2. Since this last bounce ends the growth, the expected number of times a node grows in a skiplist is 1. Therefore, in anticipation, we expect the middle node to have only two pointers in it - one starting pointer and one pointer added. This means that the expected total use of space

n * (2 * sizeof (pointer) + sizeof (data))

Compare this to the size of the node in a balanced binary search tree. If a non-zero amount of space is required to store the balance information, the skipist will actually use (as expected) less memory than a balanced BST. Please note that many types of balanced BSTs (e.g. treap s) require a lot of balance information, while others ( red / black trees , AVL trees ) have balance information, but can hide this information in the low order bit of its pointers. while others ( splay trees ) do not have balance information at all. Therefore, this is not a guaranteed gain, but in many cases it will use space.

As for other questions about simplicity, lightness, etc., it really depends. I personally find that the code to search for an element in BST is much simpler than the code to search for in skiplist. However, the rotation logic in balanced BSTs is often significantly more complicated than the insert / delete logic in a skiplist; try to see if you can cross all the possible cases of rotation in a red / black tree without accessing the link, or see if you can remember all the cases of a zigzag / zag or zag / zag from the splay tree. In this sense, it may be a little easier to remember the logic to insert or remove from a skipist.

Hope this helps!

+4
source

And aren't binary trees easier to use, at least for search (provisioning, inserting, and deleting in balanced BSTs can be tricky)?

Trees are “more recursive” (trees and subtrees), and SkipLists are “more iterative” (levels in an array). Of course, this depends on the implementation, but SkipLists can also be very useful for practical applications.

It’s easier to search in trees because you don’t need to iterate over levels in an array.

Are there other advantages / disadvantages for skipists?

  • Slide lists are easier to implement. This is a bit relative, but it's easier to implement a full-featured SkipList than the delete and balance operations in BinaryTree.

  • Trees can be persistent (better for functional programming).

  • It’s easier to remove items from the SkipLists list than internal nodes in the binary tree.

  • It's easier to add elements to binary trees (maintaining balance is another problem)

  • Binary trees are deterministic, so it’s easier to study and analyze them.

My advice . If you have time, you should use a balanced binary tree. If you have little time, use the skip list. If you do not have time, use the library.

+2
source

You are right that this is not an ideal forum.

The comment you commented on was written by the author of the original pass list sheet: a not-so-objective statement. It was 23 years old, and red-black trees are still apparently more common than badge lists. An exception is the redis key-value pair database , which includes skip lists as one option among its data structures.

Skipping lists is very cool. But the only advantage of the space that I could show in the general randomized case is that there is no need to store balance flags: two bits per value. The hierarchy is supposed to be dense enough to replicate binary tree performance. You can do this as the price of determinism (vicious randomization). A good feature of SL is that you can use less dense hierarchies to trade constant speed factors for space.

Side note: it has often not been discussed that if you do not need to go in sorted order, you can randomize unbalanced binary trees by simply encrypting the keys (i.e. matching with pseudo-random encrypted text with something very simple like RC4). Such trees are absolutely trivial to implement.

+1
source

Something not mentioned so far is that skip lists can be beneficial for concurrent operations. If you read the source of ConcurrentSkipListMap by Doug Lea ... take a look at the comments. It mentions:

There are no known effective block-free insert and delete algorithms for search trees. The immutability of the "downstream" links of index nodes (as opposed to the mutable "left" fields in true trees) makes this possible using only CAS operations.

+1
source

All Articles