I got the same error when using tm v0.6. I suspect this is happening because stemCompletion not in the default transform for this version of the tm package:
> getTransformations function () c("removeNumbers", "removePunctuation", "removeWords", "stemDocument", "stripWhitespace") <environment: namespace:tm>
The tolower function now has the same problem, but can be done using the content_transformer function. I tried a similar approach for stemCompletion but was not successful.
Note that even if stemCompletion not a default translation, it still works when compressed words are manually entered:
> stemCompletion("compani",dictCorpus) compani "companies"
So that I could continue my work, I manually limited each document in the body to single spaces, passed them through stemCompletion and combined them together with the following (awkward and not graceful!) Function:
stemCompletion_mod <- function(x,dict=dictCorpus) { PlainTextDocument(stripWhitespace(paste(stemCompletion(unlist(strsplit(as.character(x)," ")),dictionary=dict, type="shortest"),sep="", collapse=" "))) }
where dictCorpus is just a copy of a purified body, but before it arose. The extra stripWhitespace specific to my body, but most likely benign to the overall body. You can change the type parameter from "shortest" as needed.
As a complete example, let's set up a dummy package using crude data in the tm package:
> data("crude") > docs = Corpus(VectorSource(crude)) > docs <- tm_map(docs, content_transformer(tolower)) > docs <- tm_map(docs, removeNumbers) > docs <- tm_map(docs, removeWords, stopwords("english")) > docs <- tm_map(docs, removePunctuation) > docs <- tm_map(docs, stripWhitespace) > docs <- tm_map(docs, PlainTextDocument) > dictCorpus <- docs > docs <- tm_map(docs, stemDocument) >
Note. This example is odd, because in this process the word with the error "copany" is displayed: β "copani" β "NA". Not sure how to fix it ...
To run stemCompletion_mod all over the body, I just use sapply (or parSapply with a snow pack).
Perhaps someone with more experience than me can suggest a simpler change to get stemCompletion to work in v0.6 tm package.