(I use R.). A list of words called "goodwords.corpus", I look through documents in the corpus and replace each of the words in the list of "goodwords.corpus" with a word + number.
So, for example, if the word "good" is indicated in the list, and "good night" is NOT in the list, then this document:
I am having a good time goodnight
will turn into:
I am having a good 1234 time goodnight
** I am using this code (EDIT - made it reproducible):
goodwords.corpus <- c("good") test <- "I am having a good time goodnight" for (i in 1:length(goodwords.corpus)){ test <-gsub(goodwords.corpus[[i]], paste(goodwords.corpus[[i]], "1234"), test) }
However, the problem is that I want gsub to replace only ENTRE words. There is a problem: "good" is on the list of "goodwords.corpus", but then also is "good night", which is NOT on the list. So I get the following:
I am having a good 1234 time good 1234night
Is there anyway, I can say gsub only replace ENTRE words, not words that can be part of other words?
I want to use this:
test <-gsub("\\<goodwords.corpus[[i]]\\>", paste(goodwords.corpus[[i]], "1234"), test) }
I read that \ <and \> will tell gsub to only look for whole words. But obviously this does not work, because goodwords.corpus [[i]] will not work when it is in quotation marks.
Any suggestions?
r topic-modeling gsub
user2303557
source share