Grab from start to first occurrence of character with gsub

I have the following regular expression that I would like to capture from the very beginning of the sentence to the first ## . I could use strsplit as I demonstrate this task, but prefer gsub solution. If gusub not the right tool (I think it is), I would prefer a basic solution because I want to learn the basic regex tools.

 x <- "gfd gdr tsvfvetrv erv tevgergre ## vev fe ## vgrrgf" strsplit(x, "##")[[c(1, 1)]] #works gsub("(.*)(##.*)", "\\1", x) #I want to work 
+7
source share
6 answers

Just add one character by putting ? after the first quantifier, to make it "inanimate":

 gsub("(.*?)(##.*)", "\\1", x) # [1] "gfd gdr tsvfvetrv erv tevgergre " 

Here's the relevant documentation, from ?regex

By default, repetition is greedy, so the maximum possible number of repetitions is used. This can be changed to "minimum" by adding '?' to the quantifier.

+13
source

I would say:

 sub("##.*", "", x) 

Deletes everything, including after the first appearance of ## .

+4
source

In this case, I would say the opposite, that is, replace everything following the # an empty line:

 gsub("#.*$", "", x) [1] "gfd gdr tsvfvetrv erv tevgergre " 

But can you also use an undesirable modifier ? so that your regex works as you suggested:

 gsub("(.*?)#.*$", "\\1", x) [1] "gfd gdr tsvfvetrv erv tevgergre " 
+3
source

Try it like your regex

 ^[^#]+ 

starts at the beginning of the line and matches with something, not from # to the first #

+1
source

There are already some simpler answers, but since you indicated in your question that you want to learn about support for regular expressions in the R base, here is another way using a positive statement (?=#) And an inanimate option (?U) .

 regmatches(x, regexpr('(?U)^.+(?=#)', x, perl=TRUE)) [1] "gfd gdr tsvfvetrv erv tevgergre " 
+1
source

Here's another approach that uses more string tools instead of a more complex regex. First, it finds the location of the first ##, and then extracts the substring to this point:

 library(stringr) x <- "gfd gdr tsvfvetrv erv tevgergre ## vev fe ## vgrrgf" loc <- str_locate(x, "##") str_sub(x, 1, loc[, "start"] - 1) 

As a rule, I think this phased approach is more convenient than complex regular expressions.

+1
source

All Articles