TermDocumentMatrix sometimes throws an error

Question

TermDocumentMatrix sometimes throws an error

I am creating a Word Cloud based on tweets from various sports teams. This code successfully runs about 1 time in 10 times:

handle <- 'arsenal' txt <- searchTwitter(handle,n=1000,lang='en') t <- sapply(txt,function(x) x$getText()) t <- gsub('http.*\\s*|RT|Retweet','',t) t <- gsub(handle,'',t) t_c <- Corpus(VectorSource(t)) tdm = TermDocumentMatrix(t_c,control = list(removePunctuation = TRUE,stopwords = stopwords("english"),removeNumbers = TRUE, content_transformer(tolower))) m = as.matrix(tdm) word_freqs = sort(rowSums(m), decreasing=TRUE) dm = data.frame(word=names(word_freqs), freq=word_freqs) wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "Dark2"),rot.per=0.5)

Other 9 out of 10 times, it throws the following error:

 Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : 'i, j, v' different lengths In addition: Warning messages: 1: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code 2: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : NAs introduced by coercion

Any ideas guys? I googled, but still not enough! Keep in mind that I am absolutely new to R!

+7

r term-document-matrix word-cloud

Dan Sep 06 '14 at 10:31

source share

3 answers

Suppose you used the following line of code somewhere before using the DocumentTermMatrix command.

 corpus = tm_map(corpus, PlainTextDocument)

This line of code converts all text in corpus to PlainTextDocument, on which the DocumentTermMatrix function does not work correctly.

Just repeat the whole process of creating the case and pre-process it by skipping the above command, and you will be well off.

+2

Shivam shekhar May 08 '17 at 13:25

source share

If you remove:

 corpus = tm_map(corpus, PlainTextDocument)

you also need to remove:

 t_c <- Corpus(VectorSource(t))

You will then get the correct output for TermDocumentMatrix .

0

kalpesh Jan 29 '18 at 12:37

source share

Dan · Accepted Answer · 2014-09-06T10:59:32+0000

So, after a little game, the following line of code completely fixed my problem:

 t <- iconv(t,to="utf-8-mac")

TermDocumentMatrix sometimes throws an error

More articles: