How is column oriented NoSQL different than document oriented?

The three types of NoSQL databases that I read about are key, column-oriented, and document-oriented.

The key value is pretty straightforward - a key with an equal value.

I have seen document-oriented databases described as a key value, but the value may be a structure similar to a JSON object. Each "document" may have all, some, or any of the same keys as others.

The column is oriented, it seems to be very similar to a document oriented to the fact that you are not specifying a structure.

So what is the difference between the two, and why are you using one over the other?

I specifically looked at MongoDB and Cassandra. I basically need a dynamic structure that can change, but does not affect other values. At the same time, I need to be able to search / filter specific keys and run reports. With CAP, AP is most important to me. Data can "ultimately" be synchronized between nodes until there is no conflict or data loss. Each user will receive his own "table".

+55
mongodb cassandra nosql
Sep 27 '11 at 6:22
source share
3 answers

In Kassandra, each row (addressed by a key) contains one or more columns. Columns themselves are key pairs. Column names do not have to be predefined, i.e. The structure is not fixed. Columns in a row are stored in sorting order according to their keys (names).

In some cases, you may have a very large number of columns per row (for example, to act as an index to include specific types of query). Cassandra can handle such large structures efficiently, and you can get specific ranges of columns.

There is another level of structure (not so often used) called super-columns, where the column contains nested (sub) columns.

You can imagine the general structure as a nested hash table / dictionary with 2 or 3 key levels.

Regular column family:

row col col col ... val val val ... 

Super Series:

 row supercol supercol ... (sub)col (sub)col ... (sub)col (sub)col ... val val ... val val ... 

There are also higher-level structures β€” column families and key spaces β€” that can be used to separate or group your data.

See also this question: Cassandra: what is a column

Or data modeling links from http://wiki.apache.org/cassandra/ArticlesAndPresentations

Re: comparison with document-oriented databases - the latter usually insert whole documents (usually JSON), while in Cassandra you can access individual columns or supercolumns and update them individually, i.e. work at a different level of detail. Each column has its own timestamp / version (used to coordinate updates across a distributed cluster).

Cassandra column values ​​are just bytes, but can be entered as ASCII, UTF8 text, numbers, dates, etc.

Of course, you could use Cassandra as a primitive document repository by inserting columns containing JSON, but you will not get all the features of a real document-centric repository.

+25
Sep 27 '11 at 8:13
source share

The main difference is that document stores (e.g. MongoDB and CouchDB) allow arbitrarily complex documents, i.e. subdocuments inside subdocuments, lists with documents, etc., while column stores (e.g. Cassandra and HBase) allow only fixed format, for example. strict single-level or two-level dictionaries.

+30
Sep 28 '11 at 13:37
source share

In "insert", to use the words rdbms, Document-based is more consistent and direct. Please note that cassandra allows you to achieve consistency with the concept of quorum, but this will not apply to all column-based systems and will reduce availability. On a write / read system often download MongoDB. Also consider this if you are always planning to read the entire structure of an object. The document-based system is designed to return the entire document when you receive it, and is not very strong when returning parts of the entire line.

Column-based systems such as Cassandra are better than update-based documents. You can change the value of a column without even reading the row containing it. In fact, you do not need to write on the same server, the string can be contained in several files with multiple servers. On a huge fast-paced data system, head to Kassandra. Also consider this if you plan to have a very large chunk of data per key, and they will not need to download all of them in every request. In "select" Cassandra allows you to load only the desired column.

Also think that Mongo DB is written in C ++ and is on the second major release, while Cassandra needs to be run on the JVM, and its first major version is in the candidate for release only from yesterday (but version 0.X in production large company already).

Cassandra, on the other hand, was partly based on Amazon Dynamo, and is essentially built as a high-availability solution, but it has nothing to do with the column-based format. MongoDB also scales, but not as gracefully as Cassandra.

+17
Sep 28 '11 at 12:59
source share



All Articles