NoSQL database and many semi-large drops

Is there a NoSQL database (or another type) suitable for storing a large number (i.e.> 1 billion) of "medium" blocks (that is, from 20 KB to 2 MB). All I need is a mapping from A (identifier) ​​to B (blob), the ability to get "B" A data, a consistent external API for access, and the ability to "just add another computer" to scale the system.

Something simpler than a database, for example. a distributed key-value system can be beautiful, and I will be grateful for any thoughts in this regard.

Thanks for reading.

Brian

+4
source share
3 answers

If your API requirements are fully consistent with "Get (key), Put (key, blob), Remove (key)", then the storage of key values ​​(or rather, "Persistent Distributed Hash Table") is exactly what you are looking for.

There are many of these available, but without additional information it is difficult to make a firm recommendation - which OS are you targeting? What language (s) do you work in? What are the I / O characteristics of your application (cold / unchanging data such as images? High write loads as well as tweets?)

Some of the KV systems worth paying attention to are: - MemcacheDB - Berkeley DB - Voldemort

You can also look in document repositories such as CouchDB or RavenDB *. Document repositories are similar to KV repositories, but they understand the persistence format (usually JSON), so they can provide additional services such as indexing.

  • If you are developing on .Net, then skip directly to RavenDB (you'll thank me later)
+2
source

How about jackrabbit ?

Apache Jackrabbit β„’ is a fully compliant implementation of the Content Repository for Java API Technology (the JCR specified in JSR 170 and 283).

The content repository is a hierarchical content repository with support for structured and unstructured content, full-text search, version control, transactions, observations, etc.

I knew Jackrabbit when I was working with Liferay CMS. Liferay uses Jackrabbit to implement its Document Library . It stores user files in the server file system.

+1
source

You will also want to take a look at Riak . Riak is very focused on doing exactly what you ask for (just add a node, easily accessible).

+1
source

All Articles