This answer relates to a class with a very high degree of complexity (in the worst case, it is a quintic, the expected case when it is fourth to check your database for the first time, and then quarter / cube to add a record), so it does not scale enough, to Unfortunately, there is not a much better answer that I can think of right now.
The algorithm is called the Ratcliff-Obershelp algorithm , it is implemented in python difflib . The algorithm itself is the worst case of cubic time and the quadratic expected. Then you must do this for every possible pair of records that is quadratic. Of course, adding a record is linear.
EDIT: Sorry, I am not reading the documentation correctly, difflib is quadratic, not cubic. Use it, not another algorithm.
Omnipotententity
source share