Cassandra for db schema, 10 million order tables and millions of queries per day

Question

Cassandra for db schema, 10 million order tables and millions of queries per day

I am creating a database with the following characteristics:

Schemaless database with a variable number of columns for each row.
Tens of millions of records and tens of columns.
Millions of queries per day.
Thousands of letters a day.
Queries will be filtered on multiple columns (not just the key).

I am considering Cassandra, which is built on a scale.

My questions:

Do I need to scale horizontally in this case?
Does Cassandra support multiple keys pointing to the same column family?

EDIT

I would like to make sure that I understood correctly. So, the following example stops what I got from your answer:

So, if we have the following column family (it contains some store products and their details)

products // column-family name { x = { "id":"x", // this is unique id for the row. "name":"Laptop", "screen":"15 inch", "OS":"Windows"} y = { "id":"y", // this is unique id for the row. "name":"Laptop", "screen":"17 inch"} z = { "id":"z", // this is unique id for the row. "name":"Printer", "page per minute":"20 pages"} }

And we want to add the search parameter "name", we will create another copy of CF with different line keys as follows:

 products { "x:name:Laptop" = { "id":"x", "name":"Laptop", "screen":"15 inch", "OS":"Windows"} "y:name:Laptop" = { "id":"y", "name":"Laptop", "screen":"17 inch"} "z:name:Printer" = { "id":"z", "name":"Printer", "ppm":"20 pages"} }

Similarly, to add the search parameter "screen":

 products { "x:screen:15 inch" = { "id":"x" "name":"Laptop", "screen":"15 inch", "OS":"Windows"} "y:screen:17 inch" = { "id":"y", "name":"Laptop", "screen":"17 inch"} }

But if we want to make a request based on 10 search parameters or any combination of them (as is the case with my application), then we will need to create 1023 instances of the column family [(2 to power 10) -1]. And since there will be many search parameters in most lines, this means that we need about 1000 times more space to model the data (in this way), which is small, especially if we have 10,000,000 lines in the source CF.

Is this a proposed data model?

Another point: I can’t understand exactly why creating secondary indexes deprived or deprived the schema.

+7

cassandra

Ababneh a Aug 7 '12 at 13:26

source share

1 answer

ɭɘ ɖɵʊɒɼɖ 江戸 · Answer 1 · 2012-08-09T15:48:40+0000

Cassandra is not a db for which you can request nothing but a line key. But you can configure your datamodel to support these queries.

We execute 175,000,000 queries per day on our node node 6 cassandra (just!), But we only query data using row_keys and columns, because we made our datamodel this way. We do not use indexed queries.

To support richer queries, we denormalize our data using the data we will use as search parameters to force the keys to retrieve the data.

Example: We save the following object:

 obj { id : xxx //assuming id is a unique id across the system p1 : value1 p2 : value2 }

And we know that we want to search for any of these parameters, then we will save a copy of obj for column or key names as follows:

 "p1:value1:xxx" "p2:value2:xxx" "p1:value1:p2:value2:xxx" "xxx"

So we can look for obj with p1 = value1, p2 = value2, p1 = value1 AND p2 = value2, or just a unique identifier xxx.

The only other option, if you do not want to do this, is to use secondary indexes and indexed queries, but that would deprive the "no schema" requirement of your question.

EDIT is an example.

We want to save Products objects defined as

 class Products{ string uid; string name; int screen_size; //in inches string os; string brand; }

We serialize it to a string or byteArray (I always have a penchant for using Jackson Json or Protobuf ... both work very well with cassandra and are very fast). We put this array of bytes in a column.

Now the important part: creating column names and row keys. Say we want to search by screen resolution and possibly filter by brand. We define buckets for screen size as ["0_to15", "16_to_21", "21_up"]

this column:

 "{uid:"MI615FMDO548", name:"SFG-0098", screen_size:15, os:"Android JellyBean", brand:"Samsung"}

one copy is saved with: - key = "brand: Samsung" and column_name = "screen_size: 15_uid: MI615FMDO548" - key = "brand: 0_to_15" and column_name = "screen_size: 15_uid: MI615FMDO548"

Why am I adding a uid to the column name? So that the names of all columns are unique to unique products.

Part 2 example Now let's say we added

 "{uid:"MI615FMDO548", name:"SFG-0098", screen_size:15, os:"Android JellyBean", brand:"Samsung"}" "{uid:"MI615FMD5589", name:"SFG-0097", screen_size:14, os:"Android JellyBean", brand:"Samsung"}" "{uid:"MI615FMD1111", name:"SFG-0098", screen_size:17, os:"Android JellyBean", brand:"Samsung"}" "{uid:"MI615FMDO687", name:"SFG-0095", screen_size:13, os:"Android JellyBean", brand:"Samsung"}"

As a result, we get the following family of columns:

 Products{ -Row:"brand:Samsung" => "screen_size:13_uid:MI615FMDO687":"{uid:"MI615FMDO687", name:"SFG-0095", screen_size:13, os:"Android JellyBean", brand:"Samsung"}" => "screen_size:14_uid:MI615FMD5589":"{uid:"MI615FMD5589", name:"SFG-0097", screen_size:14, os:"Android JellyBean", brand:"Samsung"} => "screen_size:15_uid:MI615FMDO548":"{uid:"MI615FMDO548", name:"SFG-0098", screen_size:15, os:"Android JellyBean", brand:"Samsung"}" => "screen_size:17_uid:MI615FMD1111":"{uid:"MI615FMD1111", name:"SFG-0098", screen_size:17, os:"Android JellyBean", brand:"Samsung"}" -Row:"screen_size:0_to_15" => "brand:Samsung_uid:MI615FMDO687":"{uid:"MI615FMDO687", name:"SFG-0095", screen_size:13, os:"Android JellyBean", brand:"Samsung"}" => "brand:Samsung_uid:MI615FMD5589":"{uid:"MI615FMD5589", name:"SFG-0097", screen_size:14, os:"Android JellyBean", brand:"Samsung"} => "brand:Samsung_uid:MI615FMDO548":"{uid:"MI615FMDO548", name:"SFG-0098", screen_size:15, os:"Android JellyBean", brand:"Samsung"}" -Row:"screen_size:16_to_17" => "brand:Samsung_uid:MI615FMD1111":"{uid:"MI615FMD1111", name:"SFG-0098", screen_size:17, os:"Android JellyBean", brand:"Samsung"}" -Row:"uid:MI615FMDO687" => "product":"{uid:"MI615FMDO687", name:"SFG-0095", screen_size:13, os:"Android JellyBean", brand:"Samsung"}" -Row:"uid:MI615FMD5589" => "product":"{uid:"MI615FMD5589", name:"SFG-0097", screen_size:14, os:"Android JellyBean", brand:"Samsung"} -Row:"uid:MI615FMDO548" => "product":"{uid:"MI615FMDO548", name:"SFG-0098", screen_size:15, os:"Android JellyBean", brand:"Samsung"}" -Row:"uid:MI615FMD1111" => "product":"{uid:"MI615FMD1111", name:"SFG-0098", screen_size:17, os:"Android JellyBean", brand:"Samsung"}" }

Now, using range queries by column names, you can search by brand and screen size.

Hope this was helpful

Cassandra for db schema, 10 million order tables and millions of queries per day

More articles: