R divide by separator not in parentheses

I am currently trying to split a line on a channel separator:

999|150|222|(123|145)|456|12,260|(10|10000)

Trap: I do not want to break into |inside parentheses, I only want to break this character outside of parentheses.

This is just a split into each character |, which gives results that I don't want:

x <- '999|150|222|(123|145)|456|12,260|(10|10000)'
m <- strsplit(x, '\\|')

[[1]]
[1] "999"    "150"    "222"    "(123"   "145)"   "456"    "12,260" "(10"   
[9] "10000)"

I want to get the following results, keeping everything inside parentheses:

[[1]]
[1] "999"        "150"        "222"        "(123|145)"  "456"       
[6] "12,260"     "(10|10000)"

Any help was appreciated.

+4
source share
3 answers

You can turn on PCREwith the help of perl=Tsome dark magic:

x <- '999|150|222|(123|145)|456|12,260|(10|10000)'
strsplit(x, '\\([^)]*\\)(*SKIP)(*F)|\\|', perl=T)

# [[1]]
# [1] "999"        "150"        "222"        "(123|145)"  "456"       
# [6] "12,260"     "(10|10000)"

The idea is to skip the contents in parentheses. Live demo

- , fail , . | ( , , ...)

+11

:

scan(text=gsub("\\(|\\)", "'", x), what='', sep="|")
#[1] "999"      "150"      "222"      "123|145"  "456"      "12,260"   "10|10000"

strsplit. , strsplit, , -, , :

strsplit(x, "\\|(?!\\d+\\))", perl=TRUE)
# [1] "999"        "150"        "222"        "(123|145)"  "456"        "12,260"     "(10|10000)"
+6

x <- '999|150|222|(123|145)|456|12,260|(10|10000)'
m <- strsplit(x, '\\|(?=[^)]+(\\||$))', perl=T)

# [[1]]
# [1] "999"        "150"        "222"        "(123|145)"  "456"        "12,260"    
# [7] "(10|10000)"

Here we are not just dividing by |, but also looking forward to make sure that there are no ")" marks before the next |or end of line. Note that this method does not require or guarantee that the brackets are balanced and closed. We assume your input is well formatted.

+3
source

All Articles