I am using Delphi 2009. I have a very simple data structure with two fields:
- A string that is the key field I need to get, and usually has a length of 4 to 15 characters.
- A string that represents a data field that can be of any size, from 1 character to 10,000 characters.
The difficulty is that I can have several million of these records, so they can have a size of no more than 10 GB. Obviously I'm looking for a solution on disk, not a solution in memory.
My program should randomly retrieve these entries based on the key field. That this part should be made as efficient as possible.
Should I use a database for such a simple structure, and if so, which database is the best to handle this and the easiest to implement?
Alternatively, is there a simple data structure on disk that does not require a full-blown database that will work just as well?
Well, I needed the only answer to bring me back to reality. I was looking for something simpler than a simple database. But when there is no-duh answer to using a database, I understand that I already answered this question with my own answer to another question: The best database for small applications and tools .
My answer was DISQLite3 for the reasons I pointed out there . And this is what I probably have with my implementation.
Some good answers with some features. It's great. I can try several different methods to see what works best.
More contemplation, and I had to change the accepted answer to the GpStructuredStorage solution.
In my case, a million records totaling several gigabytes puts a strain on the database structure. In particular, the B * tree, which is used to store the index in most databases, is fast, but slows down for some operations, such as reindexing a million values.
The only thing you will find faster than B * for the index is a hash table. And that’s exactly what gabr provides, as a complement to the GpStructuredStorage solution. I think this is pretty elegant as it segmented the hash value to create a directory structure at 4 levels.
The main reason I can go to the hash solution is that I only need random access to the keys. I do not need to sort by key. If sorting were necessary, then the profit from the speed of the hash table would be lost, and the database system would be an unreasonable winner.
When I start implementing this, I have to make a comparison of this method with the database. Perhaps I will compare both with Firebird and with SQLite, which will both be worthy opponents.
Another sequel:
I just discovered Synopsis Big Table A. Bouchez , which is designed for speed and exactly matches the specifications of my question. I will try first, when I do my implementation in a few months and will report my results here.
Significantly later follow-up (July 2015)
I have never tried the synopsics of the Big Table. I am stuck with my B * tree so far. But now I upgraded to Delphi XE8 and plan to go with a database solution using FireDAC with SQLite.