I am trying to make the word cloud from a list of phrases, many of which are repeated, and not from individual words. My data looks something like this: one column of my data frame is a list of phrases.
df$names <- c("John", "John", "Joseph A", "Mary A", "Mary A", "Paul HC", "Paul HC")
I would like to make a word cloud, where all these names are considered as separate phrases whose frequency is displayed, and not the words that make them up. The code I used looks like this:
df.corpus <- Corpus(DataframeSource(data.frame(df$names))) df.corpus <- tm_map(client.corpus, function(x) removeWords(x, stopwords("english"))) #turning that corpus into a tDM tdm <- TermDocumentMatrix(df.corpus) m <- as.matrix(tdm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.frame(word = names(v),freq=v) pal <- brewer.pal(9, "BuGn") pal <- pal[-(1:2)] #making a worcloud png("wordcloud.png", width=1280,height=800) wordcloud(d$word,d$freq, scale=c(8,.3),min.freq=2,max.words=100, random.order=T, rot.per=.15, colors="black", vfont=c("sans serif","plain")) dev.off()
This creates a word cloud, but it applies to every component word, not to phrases. So, I see the relative frequency "A". "H", "John" etc. Instead of the relative frequency of โJoseph A,โ โMary A,โ etc., which is what I want.
I am sure that it is not so difficult to fix, but I can not understand it! I would appreciate any help.