Cassandra node restrictions

I am looking for if cassandra has node limitations of the hardware specification, like what can be the maximum storage on node if there is such a restriction.

I intend to use a pair of nodes with 48 TB storage (2 TB X 24 7200 rpm hard drives) per node with a good dual-core xeon processor.

I searched for such restrictions, if any, but did not find any material on this issue. And guys, why there has been a lot less noise lately when it becomes mature and its version is 0.8, while most articles / blogs are only related to 0.6 V.

+7
source share
5 answers

Cassandra spreads its data across a row, so the only hard restriction is that the row must be able to host one node.

So the short answer is no.

The longer answer is that you want to make sure that you set up a separate storage area for your persistent data and your commit logs.

Another thing to keep in mind is that you will still run into search speed issues. One of the nice things about Cassandra is that you donโ€™t need to have one node with so much data (and actually it is probably not very recommended, you storage will exceed your computing power). If you use smaller nodes (hard disk space is reasonable), your storage and processing capabilities will scale together.

+7
source

Here are a few notes about the big data set requirements.

48 TB of data per node is probably too much. It will be much better to have more nodes with less data. Periodically, you need to run nodetool repair , which includes reading all the data on the machine. If you store a lot of terabytes of data on the machine, it will be very painful.

I would limit each node to approximately 1 TB of data.

+7
source

See How much data is in a node in a Cassandra cluster?

which assumes between 1-10 TB per node is reasonable, depending on your application. Cassandra will probably still work with 48 TB, but not optimally.

Do you intend to use a replication factor of 1 or 2 (if you have 2 nodes as described above)?

Some operations (recovery, compression) can be extremely slow with that much data on one node.

+5
source

You must also be careful when using a large amount of RAM with Cassandra. RAM is great for caching data in SSTables, but with the JVM, too much heap space is counterproductive. Do not give the JVM more than 12 GB of heap space, otherwise garbage collection will take too much time and will hinder performance. This is another reason why it is better to have narrower nodes in Kassandra.

+5
source

Datastax, which is the primary provider, recommends 3 to 5 to node

Look here:

https://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningHardware_c.html

+1
source

All Articles