Strsplit in parentheses

Question

Strsplit in parentheses

Suppose I have a string like "ABC (123-456-789)", I wonder what is the best way to get "123-456-789" from it.

strsplit("ABC (123-456-789)", "\\(") [[1]] [1] "ABC" "123-456-789)"

+6

regex r

David z Jul 08 '15 at 12:34

source share

5 answers

or with sub from base R :

 sub("[^(]+\\(([^)]+)\\).*", "\\1", "ABC (123-456-789)") #[1] "123-456-789"

Explanation:

[^(]+ : matches anything other than an open bracket
\\( : matches the opening bracket that is right in front of what you want. ([^)]+) : matches the pattern you want to capture (which is then extracted in replacement="\\1" ), which is what anything but a closing bracket \\).* matches a closing bracket followed by something, 0 or more times

Another option: look ahead and look

 sub(".*(?<=\\()(.+)(?=\\)).*", "\\1", "ABC (123-456-789)", perl=TRUE) #[1] "123-456-789"

+7

Cath Jul 08 '15 at 12:40

source share

The capture groups in sub will target your desired result:

 sub('.*\\((.*)\\).*', '\\1', str1) [1] "123-456-789"

An extra check to make sure I pass the extended @akrun example:

 sub('.*\\((.*)\\).*', '\\1', str2) [1] "123-425-478" "123-423-428" "123-423-498" "123-432-423" "124-233-848"

+5

Pierre lafortune Jul 08 '15 at 12:48

source share

Try also:

  k<-"ABC (123-456-789)" regmatches(k,gregexpr("*.(\\d+).*",k))[[1]] [1] "(123-456-789)"

With a suggestion from @Arun:

 regmatches(k, gregexpr('(?<=\\()[^AZ ]+(?=\\))', k, perl=TRUE))[[1]]

With a suggestion from @akrun:

 regmatches(k, gregexpr('[0-9-]+', k))[[1]]

+4

user227710 Jul 08 '15 at 12:46

source share

You can try these gsub features.

 > gsub("[^\\d-]", "", x, perl=T) [1] "123-456-789" > gsub(".*\\(|\\)", "", x) [1] "123-456-789" > gsub("[^0-9-]", "", x) [1] "123-456-789"

Few...

 > gsub("[0-9-](*SKIP)(*F)|.", "", x, perl=T) [1] "123-456-789" > gsub("(?:(?![0-9-]).)*", "", x, perl=T) [1] "123-456-789"

+4

Avinash raj Jul 08 '15 at 13:23

source share

akrun · Accepted Answer · 2015-07-08T12:38:00+0000

If we want to extract numbers using - between braces, one option is str_extract . If there are multiple patterns in a string, use str_extract_all

  library(stringr) str_extract(str1, '(?<=\\()[0-9-]+(?=\\))') #[1] "123-456-789" str_extract_all(str2, '(?<=\\()[0-9-]+(?=\\))')

In the above codes, we use regular expressions to extract numbers and - . A positive lookbehind (?<=\\()[0-9-]+ matches the numbers along with - ( [0-9-]+ ) in (123-456-789 , not 123-456-789 . Similarly, lookahead ('[0-9 -] + (? = \)') Matches numbers together with - in 123-456-789) and not in 123-456-798 . Taken together it matches all cases that satisfy both conditions (123-456-789) , and those that are between the images are extracted, and not in such cases as (123-456-789 or 123-456-789)

With strsplit you can specify split as [()] . We store () in square brackets before [] to treat it as characters, otherwise we need to avoid the brackets ( '\\(|\\)' ).

  strsplit(str1, '[()]')[[1]][2] #[1] "123-456-789"

If there are several substrings for extracting from a string, we can execute a loop with lapply and extract the numeric separated parts with grep

  lapply(strsplit(str2, '[()]'), function(x) grep('\\d', x, value=TRUE))

Or we can use stri_split from stringi , which can also delete empty lines ( omit_empty=TRUE ).

  library(stringi) stri_split_regex(str1, '[()AZ ]', omit_empty=TRUE)[[1]] #[1] "123-456-789" stri_split_regex(str2, '[()AZ ]', omit_empty=TRUE)

Another option is rm_round from qdapRegex if we are interested in extracting the contents inside the brackets.

  library(qdapRegex) rm_round(str1, extract=TRUE)[[1]] #[1] "123-456-789" rm_round(str2, extract=TRUE)

data

  str1 <- "ABC (123-456-789)" str2 <- c("ABC (123-425-478) A", "ABC(123-423-428)", "(123-423-498) ABCDD", "(123-432-423)", "ABC (123-423-389) GR (124-233-848) AK")

Strsplit in parentheses

data

More articles: