I try to make some modeling topics, but I want to use phrases where they exist, and not separate words that is.
library(topicmodels) library(tm) my.docs = c('the sky is blue, hot sun', 'flowers,hot sun', 'black cats, bees, rats and mice') my.corpus = Corpus(VectorSource(my.docs)) my.dtm = DocumentTermMatrix(my.corpus) inspect(my.dtm)
When I check my dtm, it breaks all the words up, but I need all the phrases together, that is, there should be a column for each of: sky blue hot sun flowers black cats bees rats and mice
How to make a matrix of documents recognize phrases and words? they are separated by commas
The solution must be effective, because I want to run it on a lot of data
source share