From a practical point of view, it would be easy to do this with BerkeleyDB, as Lyric suggested.
If you are more interested in theory than practice, I would suggest that you approach it as an “external sort” operation. That is, read as much information as you can in memory, then sort by key. Write the sorted snippet as a separate file. Then the sorted files can be easily combined into one file.
Among other applications, this is the approach used by Lucene to create "inverted indexes" to search for text. “Keys” are words in documents, and “meanings” are a list of documents in which a word appears. Lucene reads documents, and for each word a record is created in the document. When the memory is full, it writes the index segment to disk. When there are many index segments on a disk, they are combined into one segment. In fact, you can also adapt the Lucene index pointer to your task.
Work can be divided into several threads. However, you must be sensitive to disk conflicts. Skipping around to read and write many files at the same time slows down the traditional drive. It may be possible to plan some activities at the same time. You could probably read new data from one file while writing the previous sorted fragment to disk, especially if there are two disks on the computer. Of course, using an SSD to temporarily store some sorted segments would help a lot.
source share