How does cassandra share key information when multiple directories are configured?

Question

How does cassandra share key information when multiple directories are configured?

I configured three separate data directories in the cassandra.yaml file, as shown below:

  data_file_directories:
     - E: / Cassandra / data / var / lib / cassandra / data
     - K: / Cassandra / data / var / lib / cassandra / data

when I create a key space and insert data, my key space was created in both the two directories and the data that was scattered. I want to know how cassandra splits data between multiple directories ?. And what is this rule?

+7

cassandra

vignesh kumar rathakumar Apr 10 '13 at 12:16

source share

2 answers

Some of what I can guess is how key sharing is split across multiple data directories. Based on the maximum available space and directory loading , SSTables of the same column family are written to different data directories.

0

vignesh kumar rathakumar Apr 23 '13 at 6:03

source share

zznate · Accepted Answer · 2014-03-08T18:40:28+0000

You use the Cassandra JBOD function when you add multiple records to data_file_directories. Data is distributed evenly across configured disks proportional to their available space.

This will also allow you to take advantage of the disk_failure_policy setting. You can read about the details here: http://www.datastax.com/dev/blog/handling-disk-failures-in-cassandra-1-2

In short, you can configure Cassandra to continue working, doing everything you can if the disk becomes full or completely does not work. This has advantages over RAID0 (where you will have virtually the same performance as JBOD), since you do not need to replace the entire data set from the backup (or complete repair), but simply repair the missing data. RAID0, on the other hand, provides higher throughput (depending on how well you know how to configure RAID arrays to match the file system and drive geometry).

If you have the resources for fault tolerant / more efficient RAID configuration (e.g. RAID10), you can simply use one directory for simplicity. Most deployments begin to rely on a density route, but a JBOD rather than a system level.

You can read about the thought process behind this problem here: https://issues.apache.org/jira/browse/CASSANDRA-4292

How does cassandra share key information when multiple directories are configured?

More articles: