Why does CouchDB use only the tree with the addition only for B +, not HAMT

Question

Why does CouchDB use only the tree with the addition only for B +, not HAMT

I read about data structures, especially immutable ones, such as the only added B + tree used in CouchDB, and the Hash array mapped trie used in Clojure and some other functional programming languages.

The main reason that data structures that work well in memory may not work well on disk seems to be that the time taken to find the disk is due to fragmentation, like with a normal binary tree.

However, HAMT is also very shallow, so it does not require more searches than tree B.

Another alleged reason is that removing from a trie-matched array is more expensive from tree B. This is based on the assumption that we are talking about a dense vector and not used when using either as a hash map.

What's more, tree B seems to be more balanced, so using it only as an add makes more garbage.

So why do CouchDB and almost every other database and file system use B trees?

[edit] fractal trees? log structured merge tree? mind = blown

[edit] Live trees of B use degrees in thousands, and HAMT has a degree of 32. A HAMT of degree 1024 may be possible, but slower due to processing popcnt 32 or 64 bits at a time.

+6

immutability couchdb clojure b-tree

Pepijn Dec 28 '13 at 16:23

source share

2 answers

David jeske · Answer 1 · 2014-09-20T19:38:12+0000

B-trees are used because they are a well-understood algorithm that provides the “ideal” cost of reading in sorted order. Since the keys are sorted, moving to the next or previous key is very cheap.

HAMT, or another hash store, stores keys randomly. Keys are retrieved by their exact value, and there is no effective way to find the next or previous key.

As for the degree, it is usually chosen indirectly by choosing the page size. HAMTs are most often used in memory with page sizes for cache lines, and btrees are most often used with secondary storage, where page sizes are associated with I / O and VM parameters.

Log Structured Merge (LSM) is a different approach to a sorted order store that provides more optimal recording performance when trading with some reading efficiency. This hit on read performance can be a problem for read-modify-write workloads, but the fewer unreadable reads there are, the more LSM provides better overall throughput versus btree - at the level of higher worst read latency.

LSM also offers the promise of wider use of the envelope. Entering new data into the proper place is “postponed”, offering the opportunity to tune read / write performance by controlling the proportion of deferred cleaning work for live work. Theoretically, an ideal LSM with zero deferral is btree and with a 100% deferral is log.

However, LSM is more a "family" of algorithms than a specific algorithm such as btree. Their use is growing in popularity, but it is hindered by the lack of a de facto optimal LSM design. LevelDB / RocksDB is one of the most practical implementations of LSM, but it is far from optimal.

Another approach to achieving write throughput efficiency is to optimize btrees with snooze when trying to maintain optimal throughput.

Fractal trees, shuttle trees, layered trees are a type of design and represent a hybrid gray space between btree and LSM. Instead of deferring recording to a standalone process, they cushion write-deferal in a fixed way. For example, such a design may be a fixed fraction with a delay of 60%. This means that they cannot achieve 100% LSM write performance, but they also have more predictable read performance, which makes them more practical replacements for btrees. (As in commercial projects of Tokutk MySQL and MongoDB for fractal trees)

Matteop · Answer 2 · 2014-02-28T09:35:38+0000

Btrees are sorted by their key, and similar keys have very different hash values on the hash map, so they are stored far from each other. Now think about the query that scans the “give me sales yesterday” range: with a hash map, you need to scan the entire map to find them, using btree in the sales_dtm columns you will find them well grouped, and you know exactly where to start and stop reading.

Why does CouchDB use only the tree with the addition only for B +, not HAMT

More articles: