How to filter data in Kassandra?

I have been using mySQL for an application for some time, and the more data I collect, the slower it gets. So I studied the NOSQL options. One of the things I have in mySQL is a view created from a join group. The application displays all the important information in the grid, and the user can select ranges, search, etc. In this dataset. Standard request material.

Looking at Cassandra, everything is already sorted based on the options that I provide in my repository-conf.xml. So I would have a specific row as my key in SuperColumn and would contain a bunch of data in the columns below that. But I can only sort by one column, and I can’t do any real search inside the columns without pulling out all the supercalls and sorting through the data, right?

I do not want to duplicate data in different ColumnFamilies, so I want to make sure that Cassandra is suitable for me. They have many search functions on Facebook, Digg, Twitter, so maybe I just don’t see a solution.

Is there a way with Cassandra to search or filter specific data values ​​in SuperColumn or related columns (columns)? If not, is there another NOSQL option?

In the example below it seems that I can only request phatduckk, friend1, John, etc. But what if I want to find someone in ColumnFamily who lived in the city == "Beverley Hills"? Can this be done without returning all records? If so, can I do a city search == "Beverly Hills" and state == "CA"? It doesn't seem like I can do this either, but I want to make sure my options are.

AddressBook = { // this is a ColumnFamily of type Super phatduckk: { // this is the key to this row inside the Super CF friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"}, John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"}, Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"}, Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"}, Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"}, }, // end row ieure: { joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"}, William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"}, }, 

}

+4
source share
4 answers

You cannot perform such operations in Kassandra. There are certain types of selection predicates that can be specified on column columns, but nothing on the value that they store. Take a look at the API and check the request types get_slice / get_superslice and get_range. Again, this is all about keys in ColumnFamily or SuperColumnFamily, not values.

If you need the functionality you described, the SQL database is the best choice. Create the right indexes in your tables, especially in the most significant columns, and you will see a big difference in query performance. Hope this helps.

+5
source

"You do not want to duplicate data in different ColumnFamilies," but that is how you make such a request in Cassandra. See http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/

+6
source

Super family does not support a secondary index, but a regular column family. Using the secondary index, you can use the GetWhere statement.

Here is one example taken from one of my PHP projects:

 public function GetCodeWithValue( $_value ) { $result = $this->getDbFamily()->getWhere(array('value' => $_value, 'used' => 0)); if ( $this->IsValid( $result )) return $result->key(); else return null; } 

This code uses this Cassandra API: https://github.com/kallaspriit/Cassandra-PHP-Client-Library f

0
source

Note that since the question was asked, Cassandra added support for indexes automatically managed by the Cassandra system (I think starting from 0.8). This may answer a question for some people, and not for managing your own index.

http://www.datastax.com/docs/1.1/dml/using_cli#indexing-a-column

At the same time, I also wanted to mention that when creating an index, the SQL database duplicates a lot of your data to create the specified index. It is really cheap in Kassandra because you can optimize it. The main problem is that you need to maintain the consistency that SQL makes transparent for you. But both mechanisms use exactly the same theoretical concept.

This is a bit like reprogramming your own std :: string with specializations specific to your application ... (e.g. think of QString and CString!)

0
source

All Articles