CQL SELECT is more than querying an indexed non-quantum column

Question

CQL SELECT is more than querying an indexed non-quantum column

EDIT1: Added case to describe the problem after the original question.

I want to query columns that are not part of my key. If I understand correctly, I need to define a secondary index in this column. However, I want to use a condition that is different from the condition (and not just the equality condition) and which still seems unsupported.

Am I missing something? How could you solve this problem?

My desired setup:

Cassandra 1.1.6 CQL3 CREATE TABLE Table1( KeyA int, KeyB int, ValueA int, PRIMARY KEY (KeyA, KeyB) ); CREATE INDEX ON Table1 (ValueA); SELECT * FROM Table1 WHERE ValueA > 3000;

Since defining a secondary index in ColumnFamilies with composite keys is still not supported in Cassandra 1.1.6, I have to work around a temporary solution to reset one of the keys, but I still have the same problem with conditions without equality.

Is there any other way to solve this problem?

Thank you for your time.

Relevant sources: http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt http://www.datastax.com/docs/1.1/ddl/indexes

EDIT1

Here is a case that will explain the problem. As noted in rs-atl, this may be a data model problem. Let's say I save the column family of all users in stackoverflow. for each user I save a batch of statistics (reputation, NumOfAnswers, NumOfVotes ... they are all int). I want to request statistics in order to get relevant users.

 CREATE TABLE UserStats( UserID int, Reputation int, NumOfAnswers int, . . . A lot of stats... . . . NumOfVotes int, PRIMARY KEY (UserID) );

Now I'm interested in a UserID section based on these characteristics. I want all users to have a reputation of more than 10 thousand, I want all users to have less than 5 answers, etc. Etc.

Hope this helps. Thanks again.

+8

indexing cassandra

Oren Nov 27 '12 at 10:55

source share

3 answers

keelar · Answer 1 · 2013-08-09T22:29:47+0000

In CQL, you can apply the WHERE to all columns as soon as you create indexes for them (i.e. secondary index). Otherwise, you will receive the following error:

 Bad Request: No indexed columns present in by-columns clause with Equal operator

Unfortunately, even with secondary indexes, the WHERE clause must have at least one CQ secondary index EQ due to a performance issue .

Q: Why is it always necessary to have at least one EQ comparison on secondary indices?
A: Inequalities in secondary indexes are always executed in memory, so without at least one equalizer on another secondary index, you will load every row in the database, which with a massive database is not a good idea. Therefore, requiring at least one equalizer per (secondary) index, you hope to limit the set of lines that need to be read into memory to a manageable size. (Although, obviously, you can still get into trouble with this as well.)

Thus, if you have something other than EQ comparison, it loads all the lines "that match your query" and checks to see if they match one at a time. This is unacceptable by default, as it "may be slow." (Essentially, indexes only index “for equality” not for anything else, such as <and>, which indexes in the relational database will be).

It should be noted that if you have several conditions without an equalizer for secondary indexes, you also need to include the ALLOW FILTERING in your query, otherwise you will get

Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

One easy workaround is to add a dummy column to your table, where all rows have the same value in that column. Thus, in this case, you can execute the specified range only for the desired column. Understand that such queries in a NoSQL database can be slow / intimidating the system.

Example

 cqlsh:demo> desc table table1; CREATE TABLE table1 ( keya int, keyb int, dummyvalue int, valuea int, PRIMARY KEY (keya, keyb) ) .... cqlsh:demo> select * from Table1; keya | keyb | dummyvalue | valuea ------+------+------------+-------- 1 | 2 | 0 | 3 4 | 5 | 0 | 6 7 | 8 | 0 | 9

Create secondary indexes for ValueA and DummyValue:

 cqlsh:demo> create index table1_valuea on table1 (valuea); cqlsh:demo> create index table1_valueb on table1 (dummyvalue);

Perform a job in the ValueA range with DummyValue=0 :

 cqlsh:demo> select * from table1 where dummyvalue = 0 and valuea > 5 allow filtering; keya | keyb | dummyvalue | valuea ------+------+------------+-------- 4 | 5 | 0 | 6 7 | 8 | 0 | 9

rs_atl · Answer 2 · 2012-11-27T17:30:49+0000

Probably the most flexible way to handle this scenario in Cassandra would be to have a separate CF for each stat, with sentinel values as keys and a stat value in the column name, for example:

 CF: StatName { Key: SomeSentinelValue { [Value]:[UserID] = "" } }

So let's say your stat is NumAnswers, and your user IDs are strings:

 CF: NumAnswers { Key: 0 { 150:Joe = "" 200:Bob = "" 500:Sue = "" } Key: 1000 { 1020:George = "" 1300:Ringo = "" 1300:Mary = "" } }

So you can see that your keys are essentially buckets of values, which can be either coarse or fine-grained, or your data, and your columns are composites of value + user ID. Now you can give Cassandra a known key (or set of keys) for the desired range (equality), and then query the range for the first component of the column name. Please note that you cannot write the user ID as a value, as this will prevent both users from having the same number.

rogerdpack · Answer 3 · 2018-02-12T21:47:07+0000

PRIMARY KEY (KeyA, KeyB));

CREATE INDEX In Table 1 (ValueA);

SELECT * FROM Table1 WHERE ValueA> 3000;

Cassandra way should have some section key and always use it with a clustering column for ValueA possibly PRIMARY KEY ((KeyA, KeyB), ValueA) , and then use it like:

select * from Table1 where KeyA='xx' and ValueA > 3000

CQL SELECT is more than querying an indexed non-quantum column

More articles: