Obviously, there are two types of memory - disk and RAM. I assume you are talking about disk storage.
First you need to find out how much space you use for the node. Verify that cassandra dir data is used on the disk (by default /var/lib/cassandra/data
) with this command: du -ch /var/lib/cassandra/data
Then you must compare this with the size of your disk, which can be found using df -h
. Only consider the entry for the df
results for the disk on which your cassandra data is recorded by checking the "Installed" column.
Using these statistics, you should be able to calculate how cassandra's data section is filled in%. As a rule, you do not want to get too close to 100%, because normal cassandra compression processes temporarily use more disk space. If you are missing, then the node can be caught with a full disk, which can be painful for a solution (as I noticed, I sometimes save a βballastβ file from several concerts, which I can only delete if I have to open additional space). As a rule, I found that no more than 70% of disk usage is on the safe side for the 0.8 series.
If you are using a newer version of cassandra, I would recommend giving Leveled Compaction strategies to reduce temporary disk usage. Instead of potentially using twice as much disk space, the new strategy will in most cases use 10x of a small fixed size (5 MB by default).
You can learn more about how compression temporarily increases disk usage on this excellent Datastax blog post: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra It also explains compaction strategies .
So, to do a little capacity planning, you can determine how much space you need. With a replication ratio of 3 (which you use above), adding 20-30 GB of raw data will add 60-90 GB after replication. Divide between 9 nodes, which may be 3 GB more per node. Does this add disk usage for node too close to having full disk? If so, you might want to add more nodes to the cluster.
One more note: the load of your nodes is not very clear - from 2 GB to 7 GB. If you use ByteOrderPartitioner over random, it can cause uneven loading and hot spots in your ring. You should use random if possible. Another possibility may be that you have additional data that you need to take care of (hints and pictures come to this). Try to clean this by running nodetool repair
and nodetool cleanup
on each node one at a time (be sure to read what they do first!).
Hope this helps.
Andrew
source share