I need to store hundreds of thousands (now, possibly many millions) of documents that start with blanks and are added to frequent ones, but never updated otherwise or deleted. These documents are in no way related to each other, and they just need to access a unique identifier.
Access reading is a subset of a document that almost always starts halfway through some indexed location (for example, “document No. 4324319, save # 53 to the end”).
These documents begin very small, in a few KB. Usually they reach a final size of about 500 KB, but many reach 10 MB or more.
I am currently using MySQL (InnoDB) to store these documents. Each of incremental saves is simply dumped into one large table with the identifier of the document to which it belongs, so reading the part of the document looks like this: "select * from save where document_id = 14 and save_id> 53 order by save_id", then manually concatenate everything together in code.
Ideally, I would like the storage solution to be easily scalable horizontally, with redundancy on all servers (for example, every document stored on at least 3 nodes) with easy recovery of broken servers.
I saw CouchDB and MongoDB as possible replacements for MySQL, but I'm not sure if any of them make a lot of sense for this particular application, although I'm open to persuasion.
Any input on a good data storage solution?
database mongodb couchdb storage
Ben dilts
source share