To make this simple, my question is: how to quickly delete a line (about 200 characters). Safety is not important, but a collision is a big deal.
Note. After a quick investigation, it seems that MurmurHash3 might be the best choice. I am open to any comment saying otherwise
Firstly, I know that there are many other similar questions, but so far I have not found a convincing answer.
I have a list of objects, each of which contains a list of approximately 3k paragraphs, which is stored in the database. Every X hours, this paragraph is updated, and I need to find if any paragraphs have changed, and if so, click only those new paragraphs.
The fastest way to find differences (knowing that most of the time the content will be identical) is to create MerkleTree , save it in the database and iterate over MerkleTree to find the differences, instead of comparing the paragraphs themselves.
This means that in my case I will create ten thousand hashes per second to compare with what is in the database. So I need a very efficient way to create these hashes. I don't care about security, I just need to make sure that the number of collisions remains very low.
What would be the best algorithm available in Java for this?
In my case, the main object consists of Sections, which consists of Languages, which consists of a Paragraph. Comparison Strategy:
1) If the hash of the object is identical, stop, otherwise go to 2)
2) Loop on the whole section, save only the section with another hash
3) Loop in all languages ββof these sections, keep only the language with a different hash
4) Loop over the entire paragraph of all these languages, if the hash is different, then click the new content.