Frequency at a Time - R TM DocumentTermMatrix

I am very new to R and cannot fully plunge into DocumentTermMatrix. I have a DocumentTermMatrix created with the TM package, it has the term frequency and the terms inside it, but I cannot figure out how to access them.

Ideally, I would like to:

Term # "the" 200 "is" 400 "a" 200 

My current code is:

  library(tm) common.words <- c("amp","@RT","I","http","https", stopwords("english"), "you") x <- Corpus(VectorSource(results)) x <- tm_map(x, stripWhitespace) x <- tm_map(x, removeNumbers) x <- tm_map(x, removePunctuation) x <- tm_map(x, stripWhitespace) dtm <- DocumentTermMatrix(x) for(i in 1:length(common.words)) { dtm <- dtm[,!colnames(dtm)%in%c(common.words[i])] } 

This is the exit from str (dtm)

  List of 6 $ i : int [1:9769] 1 1 1 1 1 1 1 1 2 2 ... $ j : int [1:9769] 1596 1684 1858 2112 2175 2490 2714 2814 873 961 ... $ v : num [1:9769] 1 1 2 1 1 2 1 1 1 1 ... $ nrow : int 1477 $ ncol : int 3201 $ dimnames:List of 2 ..$ Docs : chr [1:1477] "1" "2" "3" "4" ... ..$ Terms: chr [1:3201] "\u0093\u0085a" "aardvark" "aaron" "abbie" ... - attr(*, "class")= chr [1:2] "DocumentTermMatrix" "simple_triplet_matrix" - attr(*, "Weighting")= chr [1:2] "term frequency" "tf" 

Thanks,

-A

+4
source share
2 answers

This seems to be a sparse matrix data organization. It looks like the frequency is in the "v" list, and you get this by looking at the position of your term in the "Terms" attribute. Why not provide dput(head(results, 30)) so that your code (and your SO audience) has something to work on? After I describe the examples in the package, I suspect that you really want something like:

 tdm <- TermDocumentMatrix(x) z <- inspect( tdm[ c("the", "is", "a"), dimnames(tdm)$Docs] ) rowSums(z) 
+7
source

I had the same problem and I found that I think an easier way:

 num <- 10 # Show this many top frequent terms tdm[findFreqTerms(tdm)[1:num],] %>% as.matrix() %>% rowSums() 

Printing in columns is more complicated (I'm sure someone has a much better way than this):

 terms <- findFreqTerms(tdm)[1:num] tdm[terms,] %>% as.matrix() %>% rowSums() %>% data.frame(Term = terms, Frequency = .) %>% arrange(desc(Frequency)) 
+3
source

All Articles