Extract part of a string between two different patterns

I am trying to use the stringr package to extract the part of a string that is between two specific patterns.

For example, I have:

 my.string <- "nanaqwertybaba" left.border <- "nana" right.border <- "baba" 

and using the str_extract(string, pattern) function (where the pattern is defined by the POSIX regular expression), I would like to get:

 "qwerty" 

Google solutions do not work.

+6
source share
4 answers

I don't know if it is possible and how it is possible with the functions provided by stringr, but you can also use base regexpr and substring :

 pattern <- paste0("(?<=", left.border, ")[az]+(?=", right.border, ")") # "(?<=nana)[az]+(?=baba)" rx <- regexpr(pattern, text=my.string, perl=TRUE) # [1] 5 # attr(,"match.length") # [1] 6 substring(my.string, rx, rx+attr(rx, "match.length")-1) # [1] "qwerty" 
+8
source

In base R, you can use gsub . The brackets in the pattern create numbered capture groups. Here we select the second group in replacement , i.e. A group between the borders. . matches any character. * means that there is zero or more of the preceding element

 gsub(pattern = "(.*nana)(.*)(baba.*)", replacement = "\\2", x = "xxxnanaRisnicebabayyy") # "Risnice" 
+14
source

I would use str_match from stringr : "str_match extracts the capture groups formed by () from the first match. It returns a character matrix with one column for full match and one column for each group. Ref

 str_match(my.string, paste(left.border, '(.+)', right.border, sep=''))[,2] 

The above code creates a regular expression with paste that combines a capture group (.+) That captures 1 or more characters with left and right borders (no spaces between lines).

One match is assumed. So, [,2] selects the second column from the matrix returned by str_match .

+5
source

You can use unglue package:

 library(unglue) my.string <- "nanaqwertybaba" unglue_vec(my.string, "nana{res}baba") #> [1] "qwerty" 
0
source

All Articles