Cassandra Compression Codebase

I want to know how many bytes are exactly stored on disk when I insert a new column in the Column Family of Cassandra. My main problem is that I need to know this information when the columns are compressed using Snappy, I know the calculations of raw bytes, but due to the variability of the data, I cannot correctly approximate the compression ratio. Any information on where to find this number of bytes in the Cassandra code base would be welcome.

Thanks in advance.

+4
source share
1 answer

Compression will never give guaranteed compression ratios. The best you can get is the average ratio for sample data.

So, download the sample data, insert it into the test instance and measure the disk usage.

You may have data that is very badly compressed with Snappy and actually leads to more disk usage than storing raw bytes.

When it comes to compressing your data, there is one and only one rule: MEASURE

+2
source

All Articles