Regex captures from beginning to n character occurrences

I really invest in learning regular expression, and I play with different toy scenarios. One setting that I cannot work with is to capture from the beginning of the line to the n character occurrence , where n> 1.

Here I can grab from the beginning of the line to the first underscore, but I cannot generalize this to the second or third underscore.

x <- c("a_b_c_d", "1_2_3_4", "<_?_._:") gsub("_.*$", "", x) Here what I'm trying to achieve with regex. (`sub`/`gsub`): ## > sapply(lapply(strsplit(x, "_"), "[", 1:2), paste, collapse="_") ## [1] "a_b" "1_2" "<_?" #or ## > sapply(lapply(strsplit(x, "_"), "[", 1:3), paste, collapse="_") ## [1] "a_b_c" "1_2_3" "<_?_." 

Related message: regular expression from first character to end of line

+6
source share
5 answers

What about:

 gsub('^(.+_.+?).*$', '\\1', x) # [1] "a_b" "1_2" "<_?" 

Alternatively, you can use {} to indicate the number of retries ...

 sub('((.+_){1}.+?).*$', '\\1', x) # {0} will give "a", {1} - "a_b", {2} - "a_b_c" and so on 

So you don't need to repeat yourself if you want to match the nth ...

+3
source

Here begins. To make this safe for general use, you will need it to properly exclude special regular expression characters:

 x <- c("a_b_c_d", "1_2_3_4", "<_?_._:", "", "abcd", "____abcd") matchToNth <- function(char, n) { others <- paste0("[^", char, "]*") ## matches "[^_]*" if char is "_" mainPat <- paste0(c(rep(c(others, char), n-1), others), collapse="") paste0("(^", mainPat, ")", "(.*$)") } gsub(matchToNth("_", 2), "\\1", x) # [1] "a_b" "1_2" "<_?" "" "abcd" "_" gsub(matchToNth("_", 3), "\\1", x) # [1] "a_b_c" "1_2_3" "<_?_." "" "abcd" "__" 
+5
source

second underline in perl style regex:

 /^(.?_.?_)/ 

and the third:

 /^(.*?_.*?_.*?_)/ 
+1
source

Maybe something like this

 x ## [1] "a_b_c_d" "1_2_3_4" "<_?_._:" gsub("(.*)_", "\\1", regmatches(x, regexpr("([^_]*_){1}", x))) ## [1] "a" "1" "<" gsub("(.*)_", "\\1", regmatches(x, regexpr("([^_]*_){2}", x))) ## [1] "a_b" "1_2" "<_?" gsub("(.*)_", "\\1", regmatches(x, regexpr("([^_]*_){3}", x))) ## [1] "a_b_c" "1_2_3" "<_?_." 
+1
source

Using the Justin approach, I developed the following:

 beg2char <- function(text, char = " ", noc = 1, include = FALSE) { inc <- ifelse(include, char, "?") specchar <- c(".", "|", "(", ")", "[", "{", "^", "$", "*", "+", "?") if(char %in% specchar) { char <- paste0("\\", char) } ins <- paste(rep(paste0(char, ".+"), noc - 1), collapse="") rep <- paste0("^(.+", ins, inc, ").*$") gsub(rep, "\\1", text) } x <- c("a_b_c_d", "1_2_3_4", "<_?_._:") beg2char(x, "_", 1) beg2char(x, "_", 2) beg2char(x, "_", 3) beg2char(x, "_", 4) beg2char(x, "_", 3, include=TRUE) 
+1
source

All Articles