Find a repeating pattern in a character string using R

I have a large text containing the following expressions: "aaaahahahahaha that was a good joke".after processing, I want it to "aaaaahahahaha"disappear or at least change it to simple "ha".

At the moment I am using this:

gsub('(.+?)\\1', '', str)

This works when the template line is at the beginning of the sentence, but not where it is somewhere else. So:

str <- "aaaahahahahaha that was a good joke"
gsub('(.+?)\\1', '', str)
#[1] "ha that was a good joke"`

But

 str <- "that was aaaahahahahaha a good joke"
 gsub('(.+?)\\1', '', str)
#[1] "that was aaaahahahahaha a good joke"

This question may be related to this: find duplicate pattern in python , but I cannot find equivalence in R.

, , , - , , , , , - . : R?

.

+4
1
\b(\S+?)\1\S*\b

. .

https://regex101.com/r/sJ9gM7/46

r \\b(\\S+?)\\1\\S*\\b perl=TRUE.

+5

All Articles