Strsplit in parentheses

Suppose I have a string like "ABC (123-456-789)", I wonder what is the best way to get "123-456-789" from it.

strsplit("ABC (123-456-789)", "\\(") [[1]] [1] "ABC" "123-456-789)" 
+6
source share
5 answers

If we want to extract numbers using - between braces, one option is str_extract . If there are multiple patterns in a string, use str_extract_all

  library(stringr) str_extract(str1, '(?<=\\()[0-9-]+(?=\\))') #[1] "123-456-789" str_extract_all(str2, '(?<=\\()[0-9-]+(?=\\))') 

In the above codes, we use regular expressions to extract numbers and - . A positive lookbehind (?<=\\()[0-9-]+ matches the numbers along with - ( [0-9-]+ ) in (123-456-789 , not 123-456-789 . Similarly, lookahead ('[0-9 -] + (? = \)') Matches numbers together with - in 123-456-789) and not in 123-456-798 . Taken together it matches all cases that satisfy both conditions (123-456-789) , and those that are between the images are extracted, and not in such cases as (123-456-789 or 123-456-789)

With strsplit you can specify split as [()] . We store () in square brackets before [] to treat it as characters, otherwise we need to avoid the brackets ( '\\(|\\)' ).

  strsplit(str1, '[()]')[[1]][2] #[1] "123-456-789" 

If there are several substrings for extracting from a string, we can execute a loop with lapply and extract the numeric separated parts with grep

  lapply(strsplit(str2, '[()]'), function(x) grep('\\d', x, value=TRUE)) 

Or we can use stri_split from stringi , which can also delete empty lines ( omit_empty=TRUE ).

  library(stringi) stri_split_regex(str1, '[()AZ ]', omit_empty=TRUE)[[1]] #[1] "123-456-789" stri_split_regex(str2, '[()AZ ]', omit_empty=TRUE) 

Another option is rm_round from qdapRegex if we are interested in extracting the contents inside the brackets.

  library(qdapRegex) rm_round(str1, extract=TRUE)[[1]] #[1] "123-456-789" rm_round(str2, extract=TRUE) 

data

  str1 <- "ABC (123-456-789)" str2 <- c("ABC (123-425-478) A", "ABC(123-423-428)", "(123-423-498) ABCDD", "(123-432-423)", "ABC (123-423-389) GR (124-233-848) AK") 
+9
source

or with sub from base R :

 sub("[^(]+\\(([^)]+)\\).*", "\\1", "ABC (123-456-789)") #[1] "123-456-789" 

Explanation:

[^(]+ : matches anything other than an open bracket
\\( : matches the opening bracket that is right in front of what you want. ([^)]+) : matches the pattern you want to capture (which is then extracted in replacement="\\1" ), which is what anything but a closing bracket \\).* matches a closing bracket followed by something, 0 or more times

Another option: look ahead and look

 sub(".*(?<=\\()(.+)(?=\\)).*", "\\1", "ABC (123-456-789)", perl=TRUE) #[1] "123-456-789" 
+7
source

The capture groups in sub will target your desired result:

 sub('.*\\((.*)\\).*', '\\1', str1) [1] "123-456-789" 

An extra check to make sure I pass the extended @akrun example:

 sub('.*\\((.*)\\).*', '\\1', str2) [1] "123-425-478" "123-423-428" "123-423-498" "123-432-423" "124-233-848" 
+5
source

Try also:

  k<-"ABC (123-456-789)" regmatches(k,gregexpr("*.(\\d+).*",k))[[1]] [1] "(123-456-789)" 

With a suggestion from @Arun:

 regmatches(k, gregexpr('(?<=\\()[^AZ ]+(?=\\))', k, perl=TRUE))[[1]] 

With a suggestion from @akrun:

 regmatches(k, gregexpr('[0-9-]+', k))[[1]] 
+4
source

You can try these gsub features.

 > gsub("[^\\d-]", "", x, perl=T) [1] "123-456-789" > gsub(".*\\(|\\)", "", x) [1] "123-456-789" > gsub("[^0-9-]", "", x) [1] "123-456-789" 

Few...

 > gsub("[0-9-](*SKIP)(*F)|.", "", x, perl=T) [1] "123-456-789" > gsub("(?:(?![0-9-]).)*", "", x, perl=T) [1] "123-456-789" 
+4
source

All Articles