1. Brute force
Here is a Levenshtein distance implementation in Python.
def levenshtein_matrix(lhs, rhs): def move(index): return (index+1)%2 m = len(lhs) n = len(rhs) states = [range(n+1), [0,]*(n+1)] previous = 0 current = 1 for i in range(1, m+1): states[current][0] = i for j in range(1,n+1): add = states[current][j-1] + 1 sub = states[previous][j] + 1 repl = states[previous][j-1] + abs(cmp(lhs[i-1], rhs[j-1])) states[current][j] = min( repl, min(add,sub) ) previous = move(previous) current = move(current) return states[previous][n]
This is a typical dynamic programming algorithm, just taking advantage of the fact that since only the last line is needed, it is enough to save only two lines at a time.
For implementation in C ++, you can look at LLVM one (line 70-130), pay attention to the use of a dedicated array stack of a fixed size, replaced only if necessary by a dynamically allocated array.
I just can't keep track of your code to try to diagnose it ... so let it change the angle of attack. Instead of micro-optimizing the distance, we completely change the algorithm.
2. Improvement: using a dictionary
One of the challenges you face is that you can do much better.
The first observation is that the distance is symmetrical, although it does not change the total complexity, which will halve the required time.
Secondly, since you really have a dictionary of famous words, you can rely on this: "actor" and "actual" use a common prefix ("action"), and therefore you do not need to recompile the first steps.
This can be used with Trie (or any other sorted structure) to store your words. Then you take one word and calculate its distance relative to all the words stored in the dictionary using prefixes.
Take the example dic = ["actor", "actual", "addict", "atchoum"] , and we want to calculate the distance for word = "atchoum" (we will remove it from the dictionary at this point)
- Initialize the matrix for the word "atchoum":
matrix = [[0, 1, 2, 3, 4, 5, 6, 7]] - Choose the next word "actor"
- Prefix = "a", matrix =
[[0, 1, 2, 3, 4, 5, 6, 7], [1, 0, 1, 2, 3, 4, 5, 6]] - Prefix = "ac", matrix =
[[0, 1, 2, 3, 4, 5, 6, 7], [1, 0, 1, 2, 3, 4, 5, 6], [2, 1, 1, 2, 3, 4, 5, 6]] - Prefix = "act", matrix =
[[..], [..], [..], [..]] - Continue to the "actor", you have a distance
- Select the next word "relevant", rewind the matrix until the prefix becomes the prefix of our word, here to "act"
- Prefix = "act", matrix =
[[..], [..], [..], [..], [..]] - Continue until "actual"
- Keep using other words
The rewinding step is important here, saving the calculations performed for the previous word, with which you share a prefix of good length, you effectively save a lot of work.
Note that this is trivially implemented using a simple stack and does not require any recursive call.