I can easily write down repeated words using:,
"(?i)\\b(\\w+)(((\\.{3}\\s*|,\\s+)*|\\s+)\\1)+\\b"but this regular expression does not seem to apply to mutipe words (and why it should be in the current state). How to find duplicate phrases using regular expression?
Here I am extracting duplicate terms (regardless of the case), but the same regular expression does not contain a word to extract a repeating phrase:
library(qdapRegex)
rm_default("this is a big Big deal", pattern = "(?i)\\b(\\w+)(((\\.{3}\\s*|,\\s+)*|\\s+)\\1)+\\b", extract=TRUE)
rm_default("this is a big is a Big deal", pattern = "(?i)\\b(\\w+)(((\\.{3}\\s*|,\\s+)*|\\s+)\\1)+\\b", extract=TRUE)
I hope for a regex that will return:
"is a big is a Big"
for
x <- "this is a big is a Big deal"
To cover corner cases here, a larger desired test and conclusion is required ...
"this is a big is a Big deal",
"I want want to see",
"I want, want to see",
"I want...want to see see how",
"this is a big is a Big deal for those of, those of you who are.",
"I like it. It is cool",
)
[[1]]
[1] "is a big is a Big"
[[2]]
[1] "want want"
[[3]]
[1] "want, want"
[[4]]
[1] "want...want" "see see"
[[5]]
[1] "is a big is a Big" "those of, those of"
[[6]]
[1] NA
My current regex only allows me:
rm_default(y, pattern = "(?i)\\b(\\w+)(((\\.{3}\\s*|,\\s+)*|\\s+)\\1)+\\b", extract=TRUE)
#
#
#
#
#
#
#
#
#
#
#
#
#
#