Spelling suggestions for related words

I am working on a spell checker for the WYSIWYG web editor. I am currently using the Damerau-Levenshtein distance algorithm to compile a list of spelling suggestions. All this works well, but I'm curious how I can improve the functionality.

In particular, my implementation does not currently handle concatenated words. For example, I would like to be able to discover "areyou" and suggest "you." I think I can do this by breaking a potentially connected word into promising segments and checking both halves. Since all English words must have at least one vowel, I think I can look for vowels to help me decide where to separate the words.

The Damerau-Levenshtein distance algorithm was so useful; it is clear that others have thought about it than me. Is there a similar clever algorithm that I should consider for detecting connected words, or am I already on the right track?

+4
source share
3 answers

I assume that a candidate connected by a word will not be longer than forty (40) characters or so; most of the time it will be less than ten (10).

Given the small size, how about this pseudo code?

  if (is_spelled_wrong (word)):
     N = len (word)
     list_suggestions = []
     for i = 1 to N-1:
         wordA = word [0: i] // Pythonic 'slice' notation
         wordB = word [i + 1: N]
         if (! is_spelled_wrong (wordA) &&! is_spelled_wrong (wordB))
             list_suggestions.appened ((wordA, wordB))

In other words, just scan the string for all the features. They are few. In the case of "areyou" you would have to five (5) times.

+3
source

Since you are already reading the entire dictionary for each word, it would be inappropriate to add common word pairs to the dictionary. In addition, you can divide the input (possibly a connected word) into two words in all possible ways, and then search for the words next to each of them in the dictionary. It is not as slow as it seems - you can use DL intermediate word results to get results for your prefix.

+1
source

Check out the excellent spelling check article . Using this technique, you have two options: either include each pair of words, or any likely pair of words in the dictionary (with separated words as a solution), or try every possible split point and perform a standard search in the dictionary to see both words are valid.

+1
source

All Articles