You use the Cassandra JBOD function when you add multiple records to data_file_directories. Data is distributed evenly across configured disks proportional to their available space.
This will also allow you to take advantage of the disk_failure_policy setting. You can read about the details here: http://www.datastax.com/dev/blog/handling-disk-failures-in-cassandra-1-2
In short, you can configure Cassandra to continue working, doing everything you can if the disk becomes full or completely does not work. This has advantages over RAID0 (where you will have virtually the same performance as JBOD), since you do not need to replace the entire data set from the backup (or complete repair), but simply repair the missing data. RAID0, on the other hand, provides higher throughput (depending on how well you know how to configure RAID arrays to match the file system and drive geometry).
If you have the resources for fault tolerant / more efficient RAID configuration (e.g. RAID10), you can simply use one directory for simplicity. Most deployments begin to rely on a density route, but a JBOD rather than a system level.
You can read about the thought process behind this problem here: https://issues.apache.org/jira/browse/CASSANDRA-4292
zznate
source share