I worked with Double Metaphone and Caverphone2 to compare strings, and they work well on things like names, addresses, etc. (Caverphone2 works best for me). However, they cause too many false positives when you get numerical values โโsuch as phone numbers, IP addresses, credit card numbers, etc.
So, I looked at Luhn and Verhoeff , and they mostly describe what I want, but not quite. They seem to be good at checking, but don't seem to be made for fuzzy matching. Is there anything that behaves like Luhn and Verhoeff that could detect single-bit errors and transpose errors with two adjacent digits for encoding and comparison purposes, similar to fuzzy-string algorithms?
I would like to code a number and then compare it with 100,000 other numbers to find close matches. So, something like 7041234 will match 7041324 as a possible transcription error, but something like 4213704 will not.
Jeffg
source share