Standard Column Family and Super Column Family

I read somewhere, indicating that for a row that has thousands of columns in a standard column family, the best design is to break them down into super columns, and thus reading will be very efficient since cassandra will only need to load and return columns under the given supercall name, instead of loading and possibly returning thoudsands of columns. Can anyone confirm?

+4
source share
2 answers

This is not good advice. There are currently very few use cases for which the best columns are the best solution. The new CompositeTypes are the best solution for most of the fact that super columns have been used historically.

With that said, it looks like you don't need CompositeTypes here either. It is true that if you are reading a very large line, you should not immediately discard the entire line. Instead, you should extract parts of the string in adjacent slices.

Basically, you will run the get_slice() s series. For the first, set the number of columns, say 1000, and the column will start "". Then take the last column name from this result set (name it X) and make another call to get_slice() with the number of columns 1000, but this time set the starting column to X. Cancel the first column you return (this will be X). and then repeat the whole get_slice() process until the request returns less than 1000 columns, which signals that you are at the end of the line.

You might want to get more or less than 1000 at a time, depending on the size of the column.

+6
source

If there will be many columns or the data should be indexed, it is better to create a normal family of columns, because: 1) super-columns of super CF are not indexed and 2) any query for the deserializer of the subclass is all under the columns in the supercolumn. But this may be a limitation in the current code base, see http://wiki.apache.org/cassandra/CassandraLimitations

0
source

All Articles