So, I'm trying to use the topicmodels package for R (100 topics on the body of ~ 6400 documents, each of which is 1000 words). The process starts and then dies, I think, because it runs out of memory.
So, I am trying to reduce the size of the document matrix, which is performed by the lda() function as input; I suppose I can do this using the minDocFreq function when I generate document matrices. But when I use it, it does not seem to make any difference. Here is the code:
Here is the corresponding bit of code:
> corpus <- Corpus(DirSource('./chunks/'),fileEncoding='utf-8') > dtm <- DocumentTermMatrix(corpus) > dim(dtm) [1] 6423 4163
The same sizes and the same number of columns (i.e. the same number of terms).
Does it make sense what I'm doing wrong? Thanks.
source share