Improved Levenshtein Algorithm

I recently implemented the levenshtein algorithm in our search engine database, but we ran into a problem.

According to the main levenshtein

Levenshtein ('123456', '12x456') has the same meaning as Levenshtein ('123456', '12345x')

This is usually normal, but this is not true for my particular problem. When someone uses our site, this is not true. Manufacturers of electronic components often make similar products with a difference in the very last letter. If the first letter is different, it usually represents a completely different category. Therefore, I need an algorithm that considers coincidences at the beginning of a word more valuable than those that are behind, or, in other words, inconsistencies that occur at the beginning should have a larger penalty than those indicated in the back.

If anyone has any ideas please let me know.

+7
source share
3 answers

We had a similar problem and we used the Brew method to edit the distance

We were in Perl, so we used the Text :: Brew library. My colleague made a nice presentation about using several different algorithms , including Brew.

+1
source

See Smith-Waterman algorithm widely used in bioinformatics. It can perform local alignment of your request, but it will be slower than Levenshtein.

+2
source

Use the Jaro-Winkler Distance ... This is exactly what you are asking for.

+2
source

All Articles