Look for similar words

Question

Look for similar words

I am trying to write a spell checker.

It downloads the text, creates a dictionary from a 16-megabyte file, and then checks if the found word matches the word in the dictionary (similar = changes to two characters), if so, it changes it to the form from the dictionary.

Now I'm using the Levenshtein Distance algorithm, and processing a set of 50 words takes 3 minutes ...

I am sure there should be a faster solution. The profiler told me that my application spends more than 80% of its time on the Levenshtein feature.

Are there any better solutions / algorithms?

Here is the implemented version of the algorithm used:

def levenshteinDistance(s1, s2):
    l_s1 = len(s1)
    l_s2 = len(s2)
    d = [[a for a in genM(x, l_s2 + 1)] for x in xrange(l_s1 + 1)]
    for i in xrange(1, l_s1 + 1):
        for j in xrange(1, l_s2 + 1):
            d[i][j] = min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + decide_of_equality(s1[i - 1],s2[j - 1]))
    return d[l_s1][l_s2]

+5

python spell-checking edit distance pyenchant

Michal Apr 08 '12 at 19:30

source share

2 answers

, . , , , . , ? : <= 2 , .

0

Karl Knechtel 08 . '12 21:06

Yavar · Accepted Answer · 2012-04-08T20:15:40+0000

Norvig, , .

, , . . , , , . -, map-reduce? , , , . 1 , ( ). 2 ( ). , .

, , - cpu C i.e, python.

Look for similar words

More articles: