Understanding Cassandra's Philosophy

I am trying to get to know Apache Cassandra for a specific PoC job. After going through various articles on the network, having tested various libraries / clients, a certain question arises in my mind.

The initial reason we thought about Cassandra is because we wanted a “truly” distributed data warehouse. From my understanding of “distribution,” it ultimately boils down to some “key value” and some “consistent hashing,” if I can express myself in an eloquent way!

Thus, a key-value repository such as Cassandra is ideal. However, since I'm trying to study articles to understand data modeling in Kassandra, almost all of them explain / illustrate the use of CQL. Also, the official proclamation seems to be that CQL should be a "de jure" way of learning Cassandra. Why does such a push come in line with SQL?

I do not need a relational model, and that is why I came to Kassandra. I appreciate its basic concepts, such as split key / cluster columns, etc., and I would like to understand how it is implemented under the hoods of CQL.

In response to a question from Cassandra experts, am I really a loser to Cassandra? Should I really forget about the core value and just try to match CQL (if possible) in my use case?

+1
source share
1 answer

CQL is more than sugar , although it was originally created to encourage people to migrate from the SQL world. The world before CQL was useless, dozens of clients written differently using the Thrift protocol, but unlike the SQL Cassandra world, everyone improves every day, introducing new features in each version, and very often each of these improvements requires a new “client version” capable of processing a new kind of generated results (for example, think of counters or collections) or a new syntax for using a new function.

I am glad that I had the opportunity to go into production for more than 3 years with the help of the Thrift client (Pelops) - this helped me understand a lot of the world of cassandra, data structure, etc. - but now I will never return to such a client (although it was really great!).

In the beginning, Kassandra was completely different, in particular, it was / had

  • no schema ” means that each CF row may contain a different number of columns, and there was no place where these columns should have been declared. This led to disasters of many projects, the ability to add new columns in "runtime" led to a situation where you did not know what you could find in the table.

  • super-columns obsolete data structure replaced by wide rows

Now that the data model is stable, the CQL syntax provides greater readability, and now you can go to any project that you are not so familiar with the ability to understand how the application talks to the database thanks to its unique syntax - - every new release of Cassandra immediately follows a new client version.

CQL is not a "subset" of SQL, as many people write: it is somehow a "superset" because it is able to process different data structures, expanding the base language.

My answer: think with a key, but use CQL ONLY

NTN, Carlo

+2
source

Source: https://habr.com/ru/post/1216062/


All Articles