I use the text processing of a large database to create indicator variables that indicate the appearance of certain phrases in the observation comment field. Comments were introduced by technical experts, so the terms used are always consistent.
However, there are cases when the technicians mistakenly wrote the word, and therefore my grepl () function does not understand that the phrase (although incorrect) occurred in the observation. Ideally, I would like to represent each word in a function phrase that returns a few common spelling errors or typos of the specified word. Does such a function R exist?
With this, I could search for all possible combinations of these spelling errors of the phrase in the comment field and display them in another data frame. Thus, I could examine each origin in each case to determine whether the phenomenon of interest to me has been described by a technical specialist.
I have Googled around, but just found links to the actual spellchecker packages for R. What I'm looking for is a “reverse” spellchecker. Since the number of phrases I'm looking for is relatively small, I could really check for spelling errors manually; I just thought it would be nice if this ability were built into the R package for future text development efforts.
Thank you for your time!
r spell-checking text-mining tm
Nick evans
source share