A suitable backend for storing data for an Erlang application when data is out of memory

I am exploring options for organizing data storage for an Erlang application. The data he should use is basically a huge collection of binary drops indexed by short row identifiers. Each blob is less than 10 Kb, but there are many of them. I would expect that in general they will be up to 200 GB in size, so obviously it cannot fit into memory. A typical operation on this data is either to read the blob by its identifier, or to update the blob by its identifier, or add a new one. In each given period of time, only a subset of identifiers is used, so the performance of access to the data warehouse can benefit from the cache in memory. Speaking of performance, this is very important. The goal is to have about 500 read operations and 500 updates per second on commercial equipment (say, on the EC2 VM).

Any suggestions for use here? As far as I understand, dets is out of the question since it is limited to 2G (or is it 4G?). The Messiah probably did not understand; my impression is that it was mainly intended for cases where the data is suitable for memory. I am considering using the EDTK Berkeley DB driver to complete this task. Will this work in the above scenario? Does anyone have experience using it in production under similar conditions?

+4
source share
5 answers

tcerl came out due to the same size limit. I am not using Erlang these days, but it looks like what you are looking for.

+5
source

Have you looked at what CouchDB does? This may not be exactly what you need, as the product crashes, but there is a lot of erlang code for storing data. There is also some talk about providing a custom erlang interface instead of a REST api.

+1
source

Is there a reason why you cannot just use the file system, treating the file name as your line identifier and the contents of the file as binary? You can choose one (file system) that meets your performance requirements, and you should get caching mostly for free provided by your OS.

+1
source

Mnesia can store data on disk very well. There's also dets (disk based storage), which is roughly the same as DB Berkeley. This is in the standard lib: http://www.erlang.org/doc/apps/stdlib/index.html

0
source

I would recommend Apache CouchDB.

This works great for Erlang, and because of its sound (you mention ID-based blocks and don't talk about any relational requirements), you are looking for a document-oriented database.

Since it is a REST interface, you can simply add to it an HTTP cache of the product file if you need to cache.

The documentation for CouchDB is very high quality.

It also has a built-in Map-Reduce :)

0
source

All Articles