Key / Data Warehouse Extremely Slow on SSD

I'm sure:

  • I work with Java / Eclipse on Linux and try to store a very large number of key / value pairs of 16/32 bytes respectively on disk. Keys are completely random, generated using SecureRandom.
  • The speed is constant at ~ 50,000 inserts / sec until ~ 1 million records are reached.
  • Once this limit is reached, the java process will fluctuate every 1-2 seconds from 0% to 100%, from 150 MB to 400 MB, and from 10 attachments / sec to 100.
  • I have tried with both Berkeley DB and Kyoto Cabinet and with Btrees and Hashtables. The same results.

What can contribute:

  • Record on SSD.
  • For each insert, an average of 1.5 reads is a constant read and write.

I suspect a good speed of 50,000 until a certain cache / buffer limit is reached. Then a big slowdown may be due to the fact that the SSD does not process mixed read / write messages, as suggested on this issue: Storage with low latency key values ​​for the SSD .

Question:
Where can slow down this extreme period? Not all SSD errors. Many people happily use SSDs for the high-speed database process, and I'm sure they mix read and write a lot.

Thanks.

Edit: I will definitely remove any memory limit, and there is always room in the java process to allocate more memory.
Edit: Deleting readings and performing inserts does not change the problem.

Last modified:. For a record for hash tables, this is similar to the initial number buckets. In Kyoto’s office, this number cannot be changed and the default is ~ 1 million, so it’s better to get the number on the right during creation (1 to 4 times the maximum number of records to store). For BDB, it is intended to gradually increase the number of buckets, but since it consumes resources, it is better to predefine the number in advance.

+6
source share
1 answer

Your problem may be related to strong guarantees of the durability of the databases you use.

Basically, for any ACID compatible database, you will need at least one fsync () call to commit the database. This should happen in order to guarantee durability (otherwise, updates may be lost in the event of a system failure), but also guarantee the internal consistency of the database on disk. The database API will not return from the insert operation until the fsync () call completes.

fsync () can be a very difficult operation for many operating systems and disk hardware, even on SSDs. (The exception is SSDs with battery or capacitor support - they can handle flash cleanup operations mostly as no-op to avoid exactly the delay you are likely to experience.)

The solution is to make all your stores inside one big transaction. I do not know about Berkeley DB, but for sqlite performance can be significantly improved this way.

To see if this is your problem at all, you can try looking at the process of writing the database with strace and look for frequent fsync () calls (more than every second will be a pretty strong hint).

Update: If you are absolutely sure that you do not need durability, you can try the answer Optimize application performance in Berkeley DB ; if you do, you should learn the TDS (Transaction Data Storage) feature in Berkeley DB.

+4
source

All Articles