By default, a full repair is performed by default. The state and differences of node datasets are stored in binary trees. Recreating them here is a major factor. According to this datastax blog entry : βEvery time a repair is done, the tree must be computed, each node that participates in the repair must build its merkle tree from all the sstables stored in it, making the calculations very expensive.β
The only way I can significantly increase the speed of a full repair is to run it in parallel or restore a submenu to a subrange . Your tag means that you are running Cassandra 2.0.
1) Parallel full repair
nodetool repair -par, or
As per nodetool documentation for Cassandra 2.0
Unlike sequential repairs (described above), parallel repairs create Merkle tables for all nodes at the same time. Therefore, no snapshots are required (or generated). Use parallel repairs to complete repairs quickly or when you have a downtime that consumes resources during the repair.
2) Subband repair nodetool accepts initial and final token parameters, such as
nodetool repair -st (start token) -et (end token) $keyspace $columnfamily
For simplicity, check out this python script that calculates tokens for you and repairs the range: https://github.com/BrianGallew/cassandra_range_repair
Let me point out two alternative options:
A) Jeff Gears indicated incremental repairs .
They are available starting with Cassandra 2.1. You need to follow certain transition steps before you can use nodetool as follows:
nodetool repair -inc, or
B) OpsCenter Repair Service
For a couple of clusters in my itembase.com company , we use repairs at DataStax OpsCenter , which performs and manages minor repairs as a service.
omnibear
source share