B-Tree vs Hash Table

Question

B-Tree vs Hash Table

In MySQL, the index type is a b-tree, and the element in the b-tree is accessed in the logarithmic arithmetic time O(log(n)) .

On the other hand, access to an item in a hash table is in O(1) .

Why is the hash table not used instead of the b-tree to access data inside the database?

+57

complexity-theory computer-science mysql b-tree

JohnJohnGa Sep 05 2018-11-11T00:

source share

4 answers

In fact, it looks like MySQL uses both types of indexes either a hash table or a b-tree according to the following link .

The difference between using a b-tree and a hash table is that the former allows you to use column matching in expressions that use =,>,> =, <, <= =, or BETWEEN, and the latter is only used for comparisons of comparisons that use the = or <=> operators.

+18

lmiguelvargasf May 19 '16 at 21:23

source share

Since btree can be easily offloaded to disk. In addition, the time complexity of hash tables is constant only for sufficiently large hash tables (there should be enough buckets to store data). The size of the database table is not known in advance, so the table must be rephrased now and then in order to get optimal performance from the hash table. Retrofitting is also expensive.

+11

Emil Vikström Sep 05 2018-11-11T00:

source share

I think that Hashmaps also do not scale and can be expensive when you need to redraw the entire map.

+4

Jonathan Weatherhead Sep 05 2018-11-11T00:

source share

The Surrican · Accepted Answer · 2011-09-05 09:58

You can access elements only by their primary key in the hash table. This is faster than using the tree algorithm ( O(1) instead of log(n) ), but you cannot select ranges (all between x and y ). Tree algorithms support this in log(n) , where a full scan of the O(n) table can be performed as a hash index. Also, the constant overhead of hash indexes is usually greater (which is not a factor in theta notation, but it still exists). In addition, tree algorithms are usually easier to maintain, grow with data, scales, etc.

Hash indexes work with predefined hash sizes, so you get some "buckets" in which objects are stored. These objects loop again to really find the right one inside this section.

So, if you have small sizes, you have a lot of overhead for small elements, large sizes lead to further scanning.

Modern hash table algorithms usually scale, but scaling can be inefficient.

There are really scalable hashing algorithms. Do not ask me how this works - this is also for me. AFAIKs, they evolved from scalable replication, where re-hashing is not easy.
It is called RUSH - R. < , and these algorithms are called RUSH algorithms.

However, there may be a point at which your index is larger than your hash size, and your whole index needs to be rebuilt. This is usually not a problem, but for huge huge databases it can take several days.

Compilation for tree algorithms is small, and they are suitable for almost every use case and, therefore, by default.

However, if you have a very accurate use case, and you know exactly what you just need, you can use hash indices.

B-Tree vs Hash Table

More articles: