Improved Levenshtein Algorithm

Question

Improved Levenshtein Algorithm

I recently implemented the levenshtein algorithm in our search engine database, but we ran into a problem.

According to the main levenshtein

Levenshtein ('123456', '12x456') has the same meaning as Levenshtein ('123456', '12345x')

This is usually normal, but this is not true for my particular problem. When someone uses our site, this is not true. Manufacturers of electronic components often make similar products with a difference in the very last letter. If the first letter is different, it usually represents a completely different category. Therefore, I need an algorithm that considers coincidences at the beginning of a word more valuable than those that are behind, or, in other words, inconsistencies that occur at the beginning should have a larger penalty than those indicated in the back.

If anyone has any ideas please let me know.

+7

algorithm levenshtein distance

Mike d Oct 20 '11 at 20:49

source share

3 answers

See Smith-Waterman algorithm widely used in bioinformatics. It can perform local alignment of your request, but it will be slower than Levenshtein.

+2

Pierre Oct 20 '11 at 20:53

source share

Use the Jaro-Winkler Distance ... This is exactly what you are asking for.

+2

Will castillo Oct 26 '11 at 18:17

source share

Rob di marco · Accepted Answer · 2011-10-20T21:01:58+0000

We had a similar problem and we used the Brew method to edit the distance

We were in Perl, so we used the Text :: Brew library. My colleague made a nice presentation about using several different algorithms , including Brew.

Improved Levenshtein Algorithm

More articles: