Cassandra SSTables and Seals

Question

Cassandra SSTables and Seals

So, I studied Cassandra and tried to understand architecture, and I read the following page from the wiki: http://wiki.apache.org/cassandra/MemtableSSTable

So, to monitor the workflow here, you send a request to update your table, this request is written to CommitLog, and then to a table in memory called Memtable (which can be rebuilt from Commitlog in case of system failure). When the Memtable reaches a certain size, it clears the entire Memtable to an SSTable disk, which can no longer be changed, only merged during compaction. When you reach a custom amount of SSTables, you perform a compaction that basically combines the results, freeing up disk space and creating a single new and improved modern SSTable. Correct me, please, if I understand something is wrong.

Now I have a few questions about compaction. First, how expensive is this operation? If I required compaction whenever we have two SSTables on disk, would it be prohibitively high, or would I be better off waiting until midnight when usage is not working? Is compaction better if I have several (but small) SSTables versus several, but very large SSTables? Does a lot of unconsolidated SSTables affect read performance? How does concurrency work with this: what if I read from these SSTables, then someone does an insert that flushes the new Memtable to disk, which in turn causes a compaction?

Any information and experience you could provide about this would be great!

+7

cassandra nosql

J bellamy Jan 18 '12 at 21:43

source share

2 answers

I wrote about the various compression strategies supported by Cassandra 1.0 here: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

TL; DR: Aligned compaction is more aggressive towards compaction and is therefore recommended for workloads with many readings.

+3

jbellis Jan 19 '12 at 14:42

source share

tom.wilkie · Accepted Answer · 2012-01-18T23:42:50+0000

Trying to answer every question:

First, how expensive is this operation?

Compaction should copy everything into SSTables, which it condenses (minus any annihilation from tombstones or overwrites). However, it is cheaper than it seems at first glance, since compression uses a purely sequential IO, which works well and quickly on rotating disks.

If I required compaction whenever we have two SSTables on disk, would it be too complicated, or would I be better served until mid-night when usage is not working?

This will mean that your records will become significantly more expensive; Imagine each entry invokes a new SSTable; therefore, each record should be compact, all the records that were before it. The cost of writing N elements will be N ^ 2.

The best idea is to adopt a compaction strategy like the one used by Acunu Doubling Array: store each SSTable (aka array) at the “level” and compress them whenever there are two arrays at the level, moving the output array to the next level. This can be shown to amortize to O ((log N) / B) consecutive IOs per record, limiting the number of O (log N) arrays.

This scheme is implemented in Castle, the open source storage engine for Cassandra. For more information see here:

NB I work for Acunu

Is compaction better if I have several (but small) SSTables versus several, but very large SSTables?

Compaction with smaller SSTables will take less time, but you will have to do more. His horses for courses, indeed. However, the number and size of SSTable affect read performance (see next question)

Does a lot of unconsolidated SSTables affect read performance?

There is not much for point readings: Cassandra (and Castle) has flowering filters to avoid searching in SSTables when it knows that the key will not be there, and may end earlier when it finds the correct value (using timestamps on the values and SSTables).

However, with get_slice requests you cannot exit earlier, so you have to visit every SSTable that may contain a value in your string, so if you have a lot, your get_slices will be slower.

The situation is even worse for get_range_slices, where you cannot use a flowering filter, and every SSTable must visit every call. Making these calls will be inversely proportional to the number of SSTables you have.

Moreover, with thousands of SSTables, a false positive flowering filter speed (~ 1%) will start to hurt, because for every search you will have to search in 10s SSTables that do not contain a value!

How concurrency works: what if I read from these SSTables, then someone makes an insert that flushes the new Memtable to disk, which in turn causes compaction?

In Cassandra, SSTables are deleted from the disk if there are no more references to it in memory (as the garbage collector decided). Therefore, reading does not need to worry, and old SSTables will clear up lazily.

thanks

Tom

Cassandra SSTables and Seals

More articles: