If you are working with R, the package ldacontains a function lexicalizethat converts the raw text into the lda-c format needed for the package lda.
example <- c("I am the very model of a modern major general",
"I have a major headache")
corpus <- lexicalize(example, lower=TRUE)
Similarly, the package topicmodelshas a function dtm2ldaformatthat converts the document term matrix to lda format. You can convert a simple text document into a document term matrix using the package tm, also in R.
Thus, with these existing functions, there is great flexibility in getting text in Rfor modeling topics.
source
share