Analytics - mongodb or cassandra

I use mongodb today and I am really happy with it. I need to find a solution to solve an event log. The magazine includes content fingerprint and click logs (e.g., ad systems). It writes a lot and reads a little (mainly for daily reporting). Something like Casandra seems to be the best solution, and then Mongodb, which seems better for a document-oriented framework. Any thoughts?

+8
mongodb cassandra analytics
source share
4 answers

One of the nice things about Cassandra is its support for Hadoop map / reduce, which gives it access to a very robust ecosystem (like Pig) of tools, examples, etc.

Depending on the amount of data and the option used, you can also take advantage of the expiring columns function ( http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns ).

Gemini recently opened its Cassandra log processing tool in real time, which may look like what you want ( http://www.thestreet.com/story/11030367/1/gemini-releases-real-time-log- processing-based-on-flume-and-cassandra.html , https://github.com/geminitech/logprocessing ).

+6
source share

We used mongodb in one of the projects to register events for a distributed application. It works very well, and it makes sense to do some calculations in advance about the amount of storage, splinters, and other factors.

As a suggestion, go with a closed collection and perform a mapreduce operation every 24 hours or so to reduce the number of logs to a common table of the desired value. I noticed that due to the lack of a โ€œschemaโ€, documents in mongodb can lead to a fast growth of the db file.

+4
source share

Cassandra is optimized for high write throughput (many thousands of write operations per second), so at least it fits this criterion. However, if MongoDB's performance is good enough for your application, and you are familiar with it, Cassandra may not have a big advantage.

+1
source share

Actually, none of these databases is used for analysis on its own. Each time you choose a NoSql solution to solve a problem, you need to think about how to manipulate the data.

Cassandra is ideal for recording huge amounts of data with predictable performance and is easy to scale in multi-data center environments. On the other hand, read performance depends on the coefficient of consistency.

MongoDB is ideal for structured data, which in your case is not an advantage. MongoDB ensures that their data is consistent, but this fact can cause performance degradation. Moreover, MongoDB is not suitable for environments with many data centers.

As for access to data, they are also completely different. Cassandra provides CQL (akka SQL) which does not support Join, group, etc. Unlike Cassandra CQL, MongoDB uses JavaScript, Json, which uses its own map / reduce implementation for union operations.

To summarize, I think you should consider all these facts when choosing one of these databases. From my point of view, Cassandra is well suited to your task, but you should think carefully about the model and what queries will be used before working with Cassandra

PS I recommend considering SQL engines as an Apache drill for MongoDb and PrestoDB for Cassandra for analysis purposes

0
source share

All Articles