How to copy key space inside a cluster

I have a keyword filled with data that was expensive to generate. I want two copies of this data in my cluster. I would like to get two key spaces: lets call them mydata and mydatabackup , both of which contain the same data (I do not mind if the Cassandra timestamps are different).

Is there an easy way to do this? The closest thing I can find for an answer is to use sstable2json and json2sstable as suggested in response to a similar question ? Is there a better way?

+8
cassandra
source share
1 answer

"Is there a better way?"

All Cassandra data is stored in the data / folder (check the configuration value data_file_directories in cassandra.yaml ). You can also check the configuration of saved_caches_directory and commitlog_directory .

In the data folder you will have

  • One folder for each keyspace
  • One folder for the system key space
  • Some folder for authentication, etc.

    Inside each folder with keys you will have

  • * - Data.db files containing your real data

  • * - Filter.db files
  • * - Index.db files for the index
  • ...

To replicate data, you make a regular copy of these folders.

Our ops team uses crontab to schedule regular backups of Cassandra data this way.

Note: sometimes you can skip real-time data that is still in memory or in memory and also not cleared to disk. You can cause full compression before backing up data files. But a full seal can harm you, so be careful.


Better answer: use the provided tool to take a snapshot of your database:

http://www.datastax.com/docs/1.0/operations/backup_restore

+6
source share

All Articles