Compound indices in Apache Cassandra

I am trying to set up a cassandra column family with secondary indexes on multiple columns that I will need to filter when reading data. In my initial testing, when I use multiple indexes together, everything slows down. Here's how I configured it now (via cassandra-cli):

update column family bulkdata with comparator=UTF8Type and column_metadata=[{column_name: test_field, validation_class: UTF8Type}, {column_name: create_date, validation_class: LongType, index_type: KEYS}, {column_name: domain, validation_class: UTF8Type, index_type: KEYS}]; 

I want to get all the data where create_date> somevalue1 and column_name = somevalue2. Using pycassa for my client, I do the following:

  domain_expr = create_index_expression('domain', 'whatever.com') cd_expr = create_index_expression('create_date', 1293650000, GT) clause = create_index_clause([domain_expr, cd_expr], count=10000) for key, item in col_fam.get_indexed_slices(clause): ... 

This is a common mistake in SQL, of course, where you usually need to create a composite index based on queries. I am new to cassandra, so I don’t know if such a thing is required or even exists.

My interactions with cassandra will include a large number of entries and a large number of reads and updates. I set the indexes, believing that they were the right thing here, but maybe I'm completely wrong. I would be interested in any ideas on creating a performer system, with or without an index.

oh and this is on cassandra 0.7.0-rc3

+7
source share
1 answer

Secondary indices of the native Cassandra have some limitations. They should not be used for high power columns (too many unique values), according to the datastax documentation. It seems that the create_date column you are indexing will have more power. Also, there is no such thing as a composite index in the Cassandra index support.

For deeper coverage, you can visit my blog post http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/

Pranab

+8
source

All Articles