It's hard for me to store hundreds of millions of 16/32 byte key / value pairs with a hash array on my SSD.
From Kyoto Cabinet : When it works fine, it inserts 70,000 records per second. Once it falls, it will drop to 10-500 records / s. With default settings, a crash occurs after a million records. Looking at the documentation, this is the default number of buckets in the array, so it makes sense. I increased this number to 25 million, and indeed, it works fine to about 25 million records. The problem is that as soon as I press the number of buckets to 30 million or more, the insertion speed decreases to 10-500 records / s from the beginning. Kyoto Cabinet is not intended to increase the number of bucket after creating the database, so I can not insert more than 25 million records.
1 / Why does the insertion coefficient KC become very low as soon as the number of bucket exceeds 25M?
With Berkeley DB : the best speed I got is slightly lower than KC, closer to 50,000 records / s, but still fine. With default settings, just like KC, speed drops suddenly after a million records. I know that BDB is designed to gradually increase the number of buckets. Regardless, he tried to increase the starting number by playing with HashNumElements and FillFactor, but any of these attempts made the situation worse. Therefore, I still cannot insert more than 1-2 million records with DBD. I tried to activate unsynchronized transactions, tried different speeds of checkpoints, increased caching. Nothing improves the fall.
2 / What can lead to a decrease in the insertion rate of BDB after 1-2 million insertions?
Note. . I work with java, and when the speed drops, the CPU load drops to 0-30%, and at 100% if it works correctly.
Note. Stopping the process and resuming the insert does not change anything. Therefore, I do not think that this is due to memory limitations or garbage collection.
thanks.
Kai elvin
source share