I am trying to write a spell checker.
It downloads the text, creates a dictionary from a 16-megabyte file, and then checks if the found word matches the word in the dictionary (similar = changes to two characters), if so, it changes it to the form from the dictionary.
Now I'm using the Levenshtein Distance algorithm, and processing a set of 50 words takes 3 minutes ...
I am sure there should be a faster solution. The profiler told me that my application spends more than 80% of its time on the Levenshtein feature.
Are there any better solutions / algorithms?
Here is the implemented version of the algorithm used:
def levenshteinDistance(s1, s2):
l_s1 = len(s1)
l_s2 = len(s2)
d = [[a for a in genM(x, l_s2 + 1)] for x in xrange(l_s1 + 1)]
for i in xrange(1, l_s1 + 1):
for j in xrange(1, l_s2 + 1):
d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + decide_of_equality(s1[i - 1],s2[j - 1]))
return d[l_s1][l_s2]
source
share