Multiple spelling corrections

One-word error correction (both non-verbal and real errors) is easy:

P(w|c) P(c)

Where wis the misspelled word, and cis the candidate we are trying to match, so the candidate is a single-word token.

But on Google, when you type in something like spelligncheck, it corrects a word for two different words. Now P(w|c)it’s easy here if I use the Levenshtein distance. But that means that I can no longer have one word (one token, rather). This way it will increase the size of my dictionary exponentially.

Moreover, when I log in app le, Google fixes it to apple...

So, what is the best way to do multiple spelling corrections for words, given a unique vocabulary?

+4
source share
1 answer

I think you are looking for something like pspell.

I have prepared this demo to show you how to almost achieve what you want - this obviously can be improved much more:

<?php

class SpellChecker
{
    public function __construct($lang)
    {
        $this->pspell = pspell_new($lang);
    }

    public function check($word)
    {
        return pspell_check($this->pspell, $word);
    }

    public function closest_suggestion($word)
    {
        $suggestions = pspell_suggest($this->pspell, $word);
        $similar_sounding_words = array_filter($suggestions,
            function ($current_word) use ($word) {
                return (metaphone($current_word) == metaphone($word));
            });

        // No similar sounding words, just return the first suggestion...
        if (count($similar_sounding_words) == 0) {
            return $suggestions[0];
        }

        // Return the closest match against similar sounding words...
        return array_reduce($similar_sounding_words,
            function ($prev, $next) use ($word) {
                return (is_array($prev))
                    ? $next
                    : ((levenshtein($prev, $word) < levenshtein($next, $word))
                          ? $prev
                          : $next);
            });
    }
}

$spellchecker = new SpellChecker('en');

foreach (array('spelligncheck', 'app le') as $word) {
    if (!$spellchecker->check($word)) {
        print "Closest match for \"$word\": {$spellchecker->closest_suggestion($word)}\n";
    }
}

I tried here and got the following result:

Closest match for "spelligncheck": spellchecker
Closest match for "app le": apple

Good luck! :)

0
source

All Articles