I am working on real-time applications with the following features:
- Hundreds of clients will insert lines / documents at the same time, each time inserting a line every couple of seconds.
- Mostly append-only; almost all lines / documents once inserted never change.
- The client should only see success when the data has been flushed to disk, and after that the consistency of "your record" should be maintained.
- Customers are ready to wait about seconds to confirm - long enough to search and burn multiple discs.
- There is too much data in RAM (excluding options like Redis). But lines written a long time ago are rarely available, so they do not need to be remembered.
- Ideally, these records should not block reading.
- A key store is excellent, but there must be at least one reliable auto-increment index.
In other words (and tl; dr), clients can tolerate latency, but they need a lot of reliable write bandwidth โ more bandwidth than โone record is one disk operationโ.
I anticipate a database that will be implemented something like this: accept (theoretically limited by the number of file descriptors) the number of TCP connections, buffer these entries in memory, write them to disk as often as possible (along with automatic increment index updates), and respond only to these TCP connections when the associated disk write operation is completed. Or it can be as simple as a lazy-writing database that posts a message stating that it has written to disk (clients wait for a lazy response and then wait for the write message to report success).
I think that with such high latent tolerance this does not require too much. And I would suggest that others have such a problem as financial companies that cannot afford to lose data, but can afford to postpone a response to any one client.
Are there any database solutions tested in battle, such as Postgres, CouchDB / Couchbase, or MongoDB, that support these modes of operation?
source share