Merkle trees (aka hash trees) are used to synchronize data in both Kassandra and Dynamo.
Like any hash function, there is a possibility that different data may have the same hash value:
There is x and y, where [y! = X], but [hash (x) = hash (y)]
As "big data" grows in NOSQL, the likelihood of collision with such data becomes higher.
This means that as the data arrays grow larger, it is almost certain that different nodes in the Merkle tree will have the same parent hash.
In this case, when two different machines in the cluster intersect their merkle trees, they will get a false idea that their data is consistent. If no more data is written to this tree branch, the machines will remain unsynchronized forever.
How is this handled?
eshalev
source share