When to use CouchDB vs RDBMS

I look at CouchDB, which has many attractive features over relational databases, including:

  • intuitive REST / HTTP interface
  • simple replication
  • data stored as documents, not normalized tables

I appreciate that this is not a mature product, so it should be taken with caution, but I wonder if this is really a real replacement for the RDBMS (despite the introductory page, which says otherwise) http://couchdb.apache.org/docs/ intro.html ).

  • In what circumstances CouchDB would be a better database choice than RDBMS (e.g. MySQL), for example. in terms of scalability, design + development time, reliability and service.
  • Are there any other cases where RDBMS is still the right choice?
  • Is this either either a choice or a hybrid solution that is likely to appear as best practice?
+60
database couchdb rdbms
Aug 20 '09 at 15:47
source share
7 answers

I recently attended a NoSQL conference in London and I think that now I better understand how to answer the original question. I also wrote a blog post , and there are a couple more good ones .

Key points:

  • We have probably accumulated over 30 years of experience administering relational databases, so we should not replace them without careful consideration; non-relational data warehouses are less mature than relational, and therefore inherently more risky to accept
  • There are different types of non-relational data stores; some of them are stored in a key value, some of them are stored in documents, some of them are graph databases.
  • You can use a hybrid approach, for example. a combination of RDBMS and graph data storage for a social security site.
  • Document data repositories (e.g. CouchDB and MongoDB) are probably the closest to relational databases and provide a JSON data structure with all fields presented hierarchically, avoiding the need for table joins, and (some might argue) this is an improvement over the traditional object relational matching, which most applications are currently using
  • Non-relational databases support replication (including master-master); relational databases also support replication, but it may not be as comprehensive as the non-relational option
  • Very large sites like Twitter, Digg and Facebook use Cassandra, which is built from the ground up to support clustering.
  • Relational databases are probably suitable for 90% of cases

Thus, consensus seems to be "careful".

+43
Apr 28 2018-10-18T00:
source share

While someone gives a more detailed answer, here are some pros and cons for CouchDB

Pros:

  • you don’t need to fit your data into one of these annoying normal forms of a higher order.
  • You can change the "scheme" of your data at any time.
  • Your data will be indexed specifically for your requests, so you will get results in a constant time.

Minuses:

  • you need to create views for each query, i.e. ad-hoc queries like queries (e.g. concatenation of dynamic WHERE and SORT queries in SQL) are not available.
  • you will either have redundant data, or you will implement the client-side aggregation and sorting logic (for example, sorting many-to-many relationships for multiple fields).

Pros or Cons:

  • Creating your views is not as straightforward as in SQL, it is more like solving a puzzle. Depends on your type if it is pro or con :)
+25
Aug 20 '09 at 16:09
source share

CouchDB is one of the few “key / value stores” available, others include old ones like BDB , targeted to websites like Persevere , MongoDB and CouchDB, new super-fast like memcached (RAM only) and Tokyo Cabinet , and huge stores like Hadoop and Google BigTable (MongoDB also claims to be in this space).

Of course, space for key / value stores and relational databases. Traditionally, most RDBs are considered a layer above the key / value. For example, MySQL used BDB as an additional backend for tables. In short, key / values ​​do not know anything about the fields and relationships that are the foundation of SQL.

Keys / values ​​tend to scale more easily, making them an attractive choice when growing, like Twitter. Of course, this means that any relationship between stored values ​​should be controlled by your code, and not just declared in SQL. CouchDB's approach is to store large “documents” in terms of value, making them (mostly) autonomous, so you can get most of the necessary data in a single request. Many use cases are suitable for this idea, others are not.

The current topic that I see is that after "Rails does not scale!" scary, now many people understand that this is not about your web infrastructure; but also about intelligent caching, to avoid getting into the database and even web applications when possible. Rising star has memcached.

As always, it all depends on your needs.

+14
Aug 20 '09 at 16:16
source share

This is a difficult question to answer. Therefore, I will try to highlight areas in which CouchDB may work against you.

The two biggest sources of complexity on the Couch Users and Dev mailing lists that people have are:

  • Complex data joins.
  • Multistage map / reduction.

Couch Views have quite a few islands for themselves. If you need to aggregate / merge / traverse a set of views, you pretty much have to do it at the application level. There are a few tricks you can do with sort sorting and complex keys to help with joins, but they are still available for some data types. This may or may not be suitable for use in various applications. Moreover, many times this problem can be reduced or eliminated by structuring your data in different ways.

Other people's comments on this issue demonstrate some of the different data types that work well for CouchDB.

Another thing to keep in mind is that over and over again, the data that you might need to merge / merge / intersect will be the data that you will do offline in the RDBMS database, so you can do nothing don't lose in CouchDB.

Short answer: I think in the end CouchDB will be able to deal with any problem that you want to throw at it. But the level of comfort that you have may vary from developer to developer. I think this is somewhat subjective. I like to use the full turing language to query my data and save more logic in the application layer. Your mileage may vary.

+7
Aug 25 '09 at 2:50
source share

Sam, you have to take a different approach with CouchDB and generally with a map or database based. You cannot define a restriction that is so unique, but you can request data to check if this letter is used, and if this login is used. To guess correctly, you have to change your mind.

+3
Feb 19 '10 at 5:09
source share

Correct me if I am wrong. Couchdb is useless for cases when you need to check the uniqueness of documents in several fields. For example, it is impossible to enforce the validation rule, for example, "both the username and the email address that must be unique" and to store data in a confidential state. You can verify this before saving the document, but someone may click in front of you and the data will become inconsistent.

+2
Aug 26 '09 at 12:46
source share

If you work with tabular data, where there is only a small hierarchy of data, then it is probably best to use an RDBMS system. This is the main use of RDBMS systems, and the documentation and tool support are very good.

For more nested data, such as xml, the document database should provide faster access to your data. In addition, the storage model is more similar to the data storage model, so the search should be more direct.

0
Aug 20 '09 at 16:12
source share



All Articles