I sent commercial software that does just that. In the last iteration, we ended up sorting the file blocks into "type" and "index" so you can read or write the "third block of type foo". The file was structured as:
1) The title of the file. Points in the list of basic types. 2) Data. Each block has a heading with type, index, logical size and extra size. 3) Arrays (offset, size) of tuples for each given type. 4) An array (type, offset, count) that tracks types.
We defined it so that each block is an atomic unit. You started writing a new block and finished writing before starting anything else. You can also “set” the contents of a block. Starting a new block is always added at the end of the file, so you can add as many as you want without fragmenting the block. A “set” block can reuse an empty block.
When you opened the file, we loaded all the indexes into RAM. When you blushed or closed the file, we rewrote each index that was changed at the end of the file, then re-wrote the index index at the end of the file, and then updated the header in front. This means that the changes in the file were atomic - either you passed the point at which the header was updated or not. (Some systems use two copies of the 8 kB header from each other to save the headers, even if the disk sector goes wrong, we are not so far)
One of the "types" of the block is the "free block". When rewriting index changes and replacing the contents of a block, the old disk space was merged into a free list stored in an array of free blocks. Adjacent free blocks were combined into one larger block. Free blocks were reused when you “installed content” or for updated type block indexes, but not for the index index that was always written last.
Since indexes were always stored in memory, working with an open file was very fast - usually just a single read to get the data of one block (or get a block descriptor for streaming). Opening and closing was a bit more complicated as it required loading and dumping indexes. If this becomes a problem, we can load the secondary type index on demand and not up to amortize this cost, but this has never been a problem for us.
Priority for storage on your hard drive (on disk): Reliability! Do not lose data, even if the computer loses power while working with the file! Second priority for disk storage: do not do more I / O than necessary! Claims are expensive. On flash drives, each individual I / O is expensive, and recordings are double. Try aligning and doing batch I / O. Using something like malloc () to store to disk is usually not very large, because it is looking for too much. This is also the reason why I don't really like memory mapped files - people tend to think of them as RAM, and then the I / O pattern becomes very expensive.