Level 0 is data in memory, other levels are disk data. The important part is that the data in the levels is sorted. If level1 consists of 3 2Mb files, then in file1 these are keys 0..50 (sorted) in file2 150..200 and in file3 300..400 (as an example). Therefore, when the memory level is full, we need to insert its data on the disk in the most efficient way, which is a sequential letter (using as few disk attempts as possible). Imagine that in memory we have keys 60-120, cool, we just write them sequentially as a file, which becomes file2 in level1. Very effective! But now imagine that level1 is much larger than level0 (which is reasonable since level0 is memory). In this case, there are many files at level 1. And now our keys in memory (60-120) apply to many files, since the range of keys in level1 is very fine-grained. Now, to combine level0 with level1, we need to read many files and make many random queries, create new files in memory and write them. Thus, many levels begin here, we will have many levels, each of which will be slightly larger than the previous (x10), but not much more, therefore, when we need to transfer data from i-1 to the i-th layer, we have a good chance of reading the least number of files.
Now, since the data may change, there is no need to extend it to more expensive levels (it could be changed or deleted), and therefore we avoid expensive mergers. Data that ends at the last level is statistically less likely to change, so itβs best suited for the most expensive merger with the last level.
source share