I am trying to set up a cassandra column family with secondary indexes on multiple columns that I will need to filter when reading data. In my initial testing, when I use multiple indexes together, everything slows down. Here's how I configured it now (via cassandra-cli):
update column family bulkdata with comparator=UTF8Type and column_metadata=[{column_name: test_field, validation_class: UTF8Type}, {column_name: create_date, validation_class: LongType, index_type: KEYS}, {column_name: domain, validation_class: UTF8Type, index_type: KEYS}];
I want to get all the data where create_date> somevalue1 and column_name = somevalue2. Using pycassa for my client, I do the following:
domain_expr = create_index_expression('domain', 'whatever.com') cd_expr = create_index_expression('create_date', 1293650000, GT) clause = create_index_clause([domain_expr, cd_expr], count=10000) for key, item in col_fam.get_indexed_slices(clause): ...
This is a common mistake in SQL, of course, where you usually need to create a composite index based on queries. I am new to cassandra, so I donβt know if such a thing is required or even exists.
My interactions with cassandra will include a large number of entries and a large number of reads and updates. I set the indexes, believing that they were the right thing here, but maybe I'm completely wrong. I would be interested in any ideas on creating a performer system, with or without an index.
oh and this is on cassandra 0.7.0-rc3