Levenshtein distance gives us a way to calculate the distance between two similar lines in terms of unordered individual characters:
quick brown fox
quikc brown fax
Levenshtein distance = 3.
What is a similar algorithm for the distance between two rows with similar subsequences? For example, in
quickbrownfox
brownquickfox
Levenshtein’s distance is 10, but this does not take into account the fact that the strings have two identical subsequences, which makes them more “similar” than completely disordered words like
quickbrownfox
qburiocwknfox
and yet this completely disordered version has a Levenshtein distance of eight.
What distance measures exist that take into account the length of subsequences without assuming that subsequences can easily be broken down into separate words?
algorithm
user181548
source share