Intelligent Spell Checker

I use NHunspell to check the line for such spelling errors:

var words = content.Split(' ');
string[] incorrect;
using (var spellChecker = new Hunspell(affixFile, dictionaryFile))
{
    incorrect = words.Where(x => !spellChecker.Spell(x))
        .ToArray();
}

This generally works, but it has some problems. For example, if I check the sentence “This is a (very good) example”, it will report “(very” and “good”) as sealed. Or if the line contains a time such as "8:30", it will report it as an error word. It also has problems with commas, etc.

Microsoft Word is smart enough to recognize a list of words separated by commas, time, fraction, or commas. He knows when not to use the English dictionary, and knows when to ignore characters. How can I get a similar, more intelligent spellcheck in my software? Are there libraries that provide a bit more information?

EDIT: I don't want to force users to install Microsoft Word on their computer, so using COM interaction is not an option.

+5
source share
3 answers

, , ( , ). #/. NET, Python RE \w+ :

>>> s = "This is a (very good) example"
>>> re.findall(r"\w+", s)
['This', 'is', 'a', 'very', 'good', 'example']

, .NET - . , .NET docs, \w, , re.findall.

+6
using System.Text.RegularExpressions;
...
// any occurence of ( and ) (maybe needs escaping)
string pattern = "( (\\.? | )\\.? )"; 
foreach(string i in incorrect){
  Regex.Replace(i, pattern, String.Empty) // replace with String.Empty
}

regex . , , , Hunspell - :)

0

#, - .

public static class ExtensionHelper
{
    public static string[] GetWords(this string input)
    {
        MatchCollection matches = Regex.Matches(input, @"\b[\w']*\b");

        var words = from m in matches.Cast<Match>()
                    where !string.IsNullOrEmpty(m.Value)
                    select TrimSuffix(m.Value);

        return words.ToArray();
    }

    public static string TrimSuffix(this string word)
    {
        int apostropheLocation = word.IndexOf('\'');
        if (apostropheLocation != -1)
        {
            word = word.Substring(0, apostropheLocation);
        }

        return word;
    }
}

var NumberOfMistakes = content.GetWords(). (x = > ! hunspell.Spell(x)). Count();

0

All Articles