Which key / value storage is the most promising / stable?

Question

Which key / value storage is the most promising / stable?

I want to start using the key / value store for some side projects (mainly as a learning experience), but a lot of people have appeared lately and I don’t know where to start. Just listing from memory, I can think:

Couchdb
Mongodb
Riak
Redis
Tokyo office
Berkeley db
Cassandra
Memcachedb

And I'm sure there is something else that has slipped away in my search efforts. With all the information, it's hard to find reliable comparisons between all competitors. My criteria and questions:

(most important) What do you recommend and why?
Which one is the fastest?
Which one is the most stable?
Which one is the easiest to set up and install?
Who has bindings for Python and / or Ruby?

Edit:
It still looks like Redis is the best solution, but that is only because I got one solid answer (from ardsrk). I am looking for more answers, such as his, because they point me towards useful, quantitative information. In which repository of key value are you using, and why ?

Edit 2:
If anyone has experience with CouchDB, Riak or MongoDB, I would like to hear your experience with them (and even more so if you can offer a comparative analysis of several of them)

+58

python comparison ruby database

Mike Trpcic Mar 04 '10 at 4:17

source share

15 answers

You need to understand what the modern NoSQL phenomenon is.
This is not about repositories with key values. They have been available for decades (e.g. BerkeleyDB). Why all the fuss now?

This is not about fancy documents or object-oriented schemes and about overcoming the "impedance mismatch." Supporters of these features have been touting them for years, and they haven't gone anywhere.

It's just about 3 problems: automatic (for maintainers) and transparent (for application developers) fault tolerance, scalding and replication. Therefore, you should ignore any fashionable products that are not delivered on this front. These include Redis, MongoDB, CouchDB, etc. And focus on truly distributed solutions like cassandra, riak etc.

Otherwise, you will lose all the good things that sql gives you (adhoc requests, Crystal Reports for your boss, third-party tools and libraries) and gets nothing in return.

+24

Vagif Verdi Apr 11 '10 at 6:22

source share

This year, PyCon, Jeremy Edberg of Reddit, said:

http://pycon.blip.tv/file/3257303/

He said Reddit uses PostGres as a keystore, presumably with a simple two-column table; According to him, it was compared faster than any other store with the key values they tried. And, of course, he is very mature.

Ultimately, OverClocked is right; Your use case determines the best store. But RDMBS has long been (ab) used as key repositories, and they can also be very fast.

+8

AdamKG Mar 04 '10 at 16:00

source share

They all have different functions. And don't forget Project Voldemort , which is actually used / tested by LinkedIn in their release before each version.

It’s hard to compare. You should ask yourself what you need: for example. do you want to split? if so, some of them, such as CouchDB, will not support it. Do you want erasure encoding? Then most of them do not. Etc.

Berkeley DB is a very simple low-level storage engine that could possibly be exempted from this discussion. Based on it, several systems with key values are built to provide additional functions, such as replication, version control, coding, etc.

Also, what does your application need? Some of the solutions contain complexity that may not be needed. For example. if you just store static data that will not change, you can save it under the hash of the contents of the SHA-1 data (i.e. use the hash content as a key). In this case, you do not need to worry about freshness, synchronization, version control, and many difficulties can be eliminated.

+7

OverClocked Mar 04 '10 at 4:25

source share

I played with MongoDB, and it has one thing that makes it ideal for my application, the ability to store complex Maps / Lists in the database directly. I have a large Map where each value is a list, and I don’t need to do anything specifically to write and retrieve it without knowing all the different keys and values of the list. I don’t know much about other options, but speed and this ability make Mongo ideal for my application. In addition, the Java driver is very easy to use.

+7

MattGrommes Mar 04 '10 at 16:24

source share

One difference you have to make is what will you use the DB for? Do not jump aboard just because it is fashionable. Do you need a key value store? or do you need document-based storage? What is your memory requirement? run it on a small virtual machine or a separate one?

I recommend that you first indicate your requirements and then see which ones match your requirements.

With that said, I used CouchDB / MongoDB and prefer to use MongoDB for ease of setup and better transition from mysql style queries. I chose mongodb over sql because of dynamic schemes (without migration files!) And better data modeling (arrays, hashes). I did not evaluate based on scalability.

MongoMapper is a great MongoDB orm tool for Ruby, and it already has a working Rails 3 fork.

I have listed some details about why I preferred mongodb in my scribd slides http://tommy.chheng.com/index.php/2010/02/mongodb-for-natural-development/

+6

tommy chheng Mar 04 '10 at 16:49

source share

I notice how everyone confuses memcached with memcachedb. These are two different systems. Op asked about memcachedb.

memcached is memory. memcachedb uses Berkeley DB as a data warehouse.

+6

drr Mar 05 '10 at 3:29

source share

I have experience with Berkeley DB, so I’ll talk about what I like.

Quickly
He is very mature and stable.
He has excellent documentation.
It has C, C ++, Java and C # bindings out of the box. Other language bindings are available. I believe Python comes with connections as part of its "batteries".

The only drawback I came across is that C # bindings are new and do not seem to support every function.

+5

Ferruccio Mar 13 '10 at 11:44

source share

There is also a zodb.

+4

mikerobi Mar 04 '10 at 16:32

source share

Which key value storage is the most promising / stable?

The G-WAN KV store looks rather promising :

 DB engine Traversal ----------- ---------------------------- SQLite 0.261 ms (b-tree) Tokyo-Cabinet (TC) 4.188 ms (hash table) TC-FIXED 0.103 ms (fixed-size array) G-WAN KV 0.010 ms (unamed)

In addition, it is used internally using the G-WAN web server, known for its high concurrency rates (for stability ).

+4

Bert Jun 06 2018-11-11T00:

source share

I really like memcached .

I use it on several of my sites, and it's simple, quick and easy. It really is just incredibly easy to use, the API is easy to use. It does not store anything on disk, thus the name memcached, so if you are looking for a permanent storage mechanism.

Python has python-memcached .

I have not used the Ruby client, but a quick Google search shows RMemCache

If you only need a caching mechanism, memcached is the way to go. He developed, he is stable, and he was bleeding fast. There was a reason LiveJournal did this, and Facebook is developing it. It is used on some of the largest sites where there is a big effect. It scales very well.

+3

Xorlev Mar 04 '10 at 5:50

source share

Cassandra seems popular.

Cassandra is used by Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX and other companies with large active datasets. The largest production cluster has more than 100 TB of data in more than 150 machines.

+2

yfeldblum Apr 11 '10 at 8:01

source share

Just to make the list complete: there is Dreamcache. It is compatible with Memcached (in terms of protocol, so you can use any client library written for Memcached), it's just faster.

+1

grokk Mar 04

source share

As others have said, it always depends on your needs. For example, I prefer what works best for my applications.

At first I used memcached for quick read / write access. As a Java API, I used SpyMemcached, which comes with a very simple interface that you can use to write and read data. Due to memory leaks (no more RAM) I needed to look for another solution, also I could not scale to the right, just increasing the memory for one process did not seem to be a good achievement.

After some review, I saw couchbase, it comes with replication, clustering, automatic rollback and community publishing (MS Windows, MacOs, Linux). And the best thing for me was that the Java client implements SpyMemcached from it, so I had nothing more to do, how to configure the server, and use couchbase instead of memcached as a data store. Advantage? Of course, my data is now persistent, replicated and indexed. It comes with a web console for recording map reduction functions for viewing documents in erlang.

It supports Python, Ruby, .Net and much more, simplifies configuration using the web console and client tools. It works stably. With some tests, I was able to record about 10 thousand per second for recordings of 200-400 bytes in size. Reading readings were higher though (both tested locally). Take a lot of fun making your decision.

+1

Alex M Jan 29 '13 at 15:45

source share

Only experience with mongoDB, memchache and redis. Here is a comparison between them and couchDB.

MongoDB seems to be the most popular. It supports shape and replication, ultimately consistent, has good support in a ruby (mangoid). It also has a richer feature set than the other two. All mongo, redis and memchache can store the key value in memory, but redis seems to be much faster, according to this post , redis is 2x write, 3x read is faster than mongo. It has more advanced data structures and is more lightweight.

I would say that they have different ways of using, mongoDB is probably good for a large data set and storing documents, and memchache and redis are better for storing caches or logs.

+1

Bruce Xinda Lin May 31 '13 at 23:46

source share

ardsrk · Accepted Answer · 2010-03-04 05:35

What do you recommend and why?

I recommend Redis. What for? Continue reading!

Which one is the fastest?

I can’t say how fast he is. But Redis is fast . This is fast because it contains all the data in RAM. Recently, a virtual memory function has been added, but all keys remain in the main memory, and only rarely used values are changed to disk.

Which one is the most stable?

Again, since I do not have direct experience with other keystores that I cannot compare. However, Redis is used in production by many web applications such as GitHub and Instagram , and many others.

Which one is the easiest to set up and install?

Redis is pretty easy to set up. Take the source and in the Linux drawer make install . This gives the redis-server binary, which you can put in your path and run it.

redis-server is bound by default to port 6379. Look at redis.conf , which comes with the source for more settings and options.

Which of them have bindings for Python and / or Ruby?

Redis has excellent Ruby and Python support .

In response to an Xorlev comment below: Memcached is just a keystore. Redis supports complex data types , such as lists, sets, and sorted sets, while at the same time providing a simple interface to these data types.

There is also make 32bit , which makes all pointers only 32-bit, even on 64-bit machines. This saves significant memory on machines with less than 4 GB of memory.

Which key / value storage is the most promising / stable?

More articles: