Look for similar words

I am trying to write a spell checker.

It downloads the text, creates a dictionary from a 16-megabyte file, and then checks if the found word matches the word in the dictionary (similar = changes to two characters), if so, it changes it to the form from the dictionary.

Now I'm using the Levenshtein Distance algorithm, and processing a set of 50 words takes 3 minutes ...

I am sure there should be a faster solution. The profiler told me that my application spends more than 80% of its time on the Levenshtein feature.

Are there any better solutions / algorithms?

Here is the implemented version of the algorithm used:

def levenshteinDistance(s1, s2):
    l_s1 = len(s1)
    l_s2 = len(s2)
    d = [[a for a in genM(x, l_s2 + 1)] for x in xrange(l_s1 + 1)]
    for i in xrange(1, l_s1 + 1):
        for j in xrange(1, l_s2 + 1):
            d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + decide_of_equality(s1[i - 1],s2[j - 1]))
    return d[l_s1][l_s2]
+5
source share
2 answers

Norvig, , .

, , . . , , , . -, map-reduce? , , , . 1 , ( ). 2 ( ). , .

, , - cpu C i.e, python.

+2

, . , , , . , ? : <= 2 , .

0

All Articles