Change Levenshtein distance for positional displacement

I use the Levenshtein distance algorithm to compare the company name provided as user input with a database of well-known brand names to find the closest match. The algorithm itself works fine, but I want to build the offset so that the editing distance is considered lower if the original parts of the lines match.

For example, if the search criterion is “ABCD”, then both “ABCD Co.” and “XYX ABCD” have the same editing distance. However, I want to add weight to the fact that the initial parts of the first line match the search criteria closer than the second line.

One way to do this could be to change the cost of inserting / deleting / replacing higher at the beginning of the lines and lower at the end. Does anyone have an example of successfully implementing this? Is using Levenshtein distance an even better way to do what I'm trying to achieve? Is my guess about the exact approach?

UPDATE: For my immediate purposes, I decided to abandon the above and instead use the Jaro Winkler editing distance, which seems to solve the problem. However, I will leave it open for further input.

+2
levenshtein distance distance
source share
1 answer

What you are looking for looks like Smith-Waterman local alignment: http://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm

0
source share

All Articles