Why is LevelDB required for more than two levels?

Question

Why is LevelDB required for more than two levels?

I think that only two levels (level-0 and level-1) are in order, why LevelDB needs level 2, level-3, etc.

+6

leveldb

ideawu Jan 13 '13 at 15:51

source share

3 answers

Jesse · Answer 1 · 2013-01-14T10:54:27+0000

I will point you towards some articles on LevelDB and the underlying storage structure.

So, in the documentation for LevelDB, he discusses merging between levels.

These mergers affect the gradual migration of new updates from a young level to the highest level using only massive read and write operations (i.e., minimizing expensive searches).

LevelDB is similar in structure to the Log of structured merge trees . The document discusses different levels if you are interested in analyzing this. If you can go through math, it seems to you that it is best to understand the data structure.

It’s much easier to read the level analysis. DB talks about the relation of the data warehouse to LSM trees, but from the point of view of your questions about levels, all this says:

Finally, having hundreds of SSTables on disk is also not a great idea, so we periodically run a process to merge SSTables on disk.

The LevelDB documentation probably provides the best answer: (maximizing the size of the read and write, as LevelDB is on disk (slow search)).

Good luck

jayadev · Answer 2 · 2013-05-08T19:44:57+0000

I think this is mainly due to the simple and quick merging of levels.

In Leveldb level- (i + 1) has approx. 10 times more data compared to level-i. This is more like a multi-level cache structure, where if the database has 1000 entries between the keys x1-x2, then 10 of the most frequently visited in this range will be at level 1 and 100 in the same range at level 2 and rest at level 3 (this is not accurate, but simply to give an intuitive idea of the levels). In this setting, to combine the file into level-i, we need to look at no more than 10 files in level- (i + 1), and all this can be written into memory, quick merge and write. This results in relatively small pieces of data being read for each summarization / merge operation.

On the other hand, if you have only 2 levels, the key range in one file of level 0 can potentially correspond to 1000 files at level 1, and all of them should be open for merging, which will be rather slow. Please note that the important assumption here is that we have files of a fixed size (say 2 MB). With level 1 variable length files, your idea can still work, and I think a variant of this is used on systems like HBase and Cassandra.

Now, if you are worried about delayed searches with many levels, again it looks like a multi-level cache structure, the most recent recorded data will be at higher levels to help with a typical link locale.

ren · Answer 3 · 2016-03-16T14:25:12+0000

Level 0 is data in memory, other levels are disk data. The important part is that the data in the levels is sorted. If level1 consists of 3 2Mb files, then in file1 these are keys 0..50 (sorted) in file2 150..200 and in file3 300..400 (as an example). Therefore, when the memory level is full, we need to insert its data on the disk in the most efficient way, which is a sequential letter (using as few disk attempts as possible). Imagine that in memory we have keys 60-120, cool, we just write them sequentially as a file, which becomes file2 in level1. Very effective! But now imagine that level1 is much larger than level0 (which is reasonable since level0 is memory). In this case, there are many files at level 1. And now our keys in memory (60-120) apply to many files, since the range of keys in level1 is very fine-grained. Now, to combine level0 with level1, we need to read many files and make many random queries, create new files in memory and write them. Thus, many levels begin here, we will have many levels, each of which will be slightly larger than the previous (x10), but not much more, therefore, when we need to transfer data from i-1 to the i-th layer, we have a good chance of reading the least number of files.

Now, since the data may change, there is no need to extend it to more expensive levels (it could be changed or deleted), and therefore we avoid expensive mergers. Data that ends at the last level is statistically less likely to change, so it’s best suited for the most expensive merger with the last level.

Why is LevelDB required for more than two levels?

More articles: