Replace the capture group with repeats of one character, while maintaining the length of the capture group

Suppose you want to replace AXAwith AAAas well as AXXXXXAwith AAAAAAA.

Basically any number of characters Xbetween two Awith the corresponding number As.

Using gsub(), I tried:

gsub(x = "AXA", pattern = "(A)(X+)(\\1)", replacement = "\\1\\1\\1")

what gives AAA. However, it does not AAAmatter how long it X+takes. How can I access the length of subgroup 2 at the output?

Possible duplicate:   Replace duplicate character with another duplicate character

But IMHO is quite different for a single issue.

+4
source share
1 answer

: A , \\1 A. , 3 As. : X A A. Perl:

input = "AXXXA"
gsub("(?:A|(?<!^)\\G)\\KX(?=X*A)", "A", input, perl=TRUE)

:

[1] "AAAAA"

\G , \K A. (?=X*A) look-ahead , X A.

EDIT:

(, Xyz 123 A):

input = "123XyzXyzXyz123"
gsub("(?:123|(?<!^)\\G)\\KXyz(?=(?:Xyz)*123)", "A", input, perl=TRUE)

: [1] "123AAA123"

2:

2 A, \p{L} A:

gsub("(?:A|(?<!^)\\G)\\K\\p{L}(?=\\p{L}*A)", "A", input, perl=TRUE)
=> [1] "XSDFAAAAAA"
+4

All Articles