How to understand bloom_filter_fp_chance and read_repair_chance in Kassandra

Flower filters

When data is requested, the Bloom filter checks if the row exists before doing disk I/O. 

Read Repair

 Read Repair perform a digest query on all replicas for that key 

My confusion is how to set this value from 0 to 1. What happens when the value changes?

Thanks in advance,

+5
source share
1 answer

Bloom_filter_fp_chance and read_repair_chance control two different things. Usually, you leave them at their default values, which should work well for most typical use cases.

bloom_filter_fp_chance controls the accuracy of the bloom filter data for SSTables stored on disk. The bloom filter is stored in memory, and when you read, Cassandra will check the flower filters to see which SSTables can have data for the key you are reading. The flowering filter often gives false positives, and when you actually read SSTable, it turns out that the key does not exist in SSTable and reading it was a waste of time. The better the accuracy used for the flowering filter, the less false positives it will give (but the more memory it will take).

From the documentation:

 0 Enables the unmodified, effectively the largest possible, Bloom filter 1.0 Disables the Bloom Filter The recommended setting is 0.1. A higher value yields diminishing returns. 

Thus, a larger number gives a higher probability of false positive (fp) when reading a flowering filter.

read_repair_chance controls the likelihood that a key reading will be checked for other replicas for that key. This is useful if your system has frequent downtime of nodes, which leads to data failure. If you do a lot of readings, then reading repair will slowly return data in synchronism, as you read, without the need for a complete repair of the nodes. Higher settings will lead to more recovery of the source text and consume more resources, but synchronize data faster as you read.

See the documentation for these options here .

+10
source

All Articles