Lookaround lookrefore for regex for R

I am trying to use regular expressions using the stringr package to extract some text. For some reason, I get the error "Invalid regexp". I tried regex expression in some website testing tools and it seems to work there. I was wondering if there is something unique in how the regular expression works in R and especially in the stringr package.

Here is an example:

string <- c("MARKETING: Vice President", "FINANCE: Accountant I", "OPERATIONS: Plant Manager") pattern <- "[AZ]+(?=:)" test <- gsub(" ","",string) results <- str_extract(test, pattern) 

This does not work. I would like to get "MARKETING", "FINANCE" and "OPERATIONS" without ":" in them. This is why I use lookahead syntax. I understand that I can just get around this using:

 pattern <- "[AZ]+(:)" test <- gsub(" ","",string) results <- gsub(":","",str_extract(test, pattern)) 

But I expect that I may need to use images for more complex situations than this in the near future.

Do I need to adjust regex with some screens or something to make this work?

+4
source share
2 answers

Lookahead statements require that you define a regular expression as a perl regular expression in R.

 str_extract(string, perl(pattern)) # [1] "MARKETING" "FINANCE" "OPERATIONS" 

You can also do this easily in the R database:

 regmatches(string, regexpr(pattern, string, perl=TRUE)) # [1] "MARKETING" "FINANCE" "OPERATIONS" 

regexpr finds matches and regmatches uses matching data to extract substrings.

+5
source

You can do this directly with sub and grouping.

 sub('^([AZ]+):.*$', '\\1', string) # [1] "MARKETING" "FINANCE" "OPERATIONS" 

Where I commit the group to the beginning of the line, looking for one or more capital letters and saving them. They should be followed by a colon, : and then zero or more additional characters.

+2
source

All Articles