Grab from start to first occurrence of character with gsub

Question

Grab from start to first occurrence of character with gsub

I have the following regular expression that I would like to capture from the very beginning of the sentence to the first ## . I could use strsplit as I demonstrate this task, but prefer gsub solution. If gusub not the right tool (I think it is), I would prefer a basic solution because I want to learn the basic regex tools.

 x <- "gfd gdr tsvfvetrv erv tevgergre ## vev fe ## vgrrgf" strsplit(x, "##")[[c(1, 1)]] #works gsub("(.*)(##.*)", "\\1", x) #I want to work

+7

regex r

Tyler rinker Nov 28 '12 at 15:47

source share

6 answers

I would say:

 sub("##.*", "", x)

Deletes everything, including after the first appearance of ## .

+4

Sacha epskamp Nov 28 '12 at 15:53

source share

In this case, I would say the opposite, that is, replace everything following the # an empty line:

 gsub("#.*$", "", x) [1] "gfd gdr tsvfvetrv erv tevgergre "

But can you also use an undesirable modifier ? so that your regex works as you suggested:

 gsub("(.*?)#.*$", "\\1", x) [1] "gfd gdr tsvfvetrv erv tevgergre "

+3

Andrie Nov 28 '12 at 15:54

source share

Try it like your regex

 ^[^#]+

starts at the beginning of the line and matches with something, not from # to the first #

+1

garyh Nov 28 '12 at 15:50

source share

There are already some simpler answers, but since you indicated in your question that you want to learn about support for regular expressions in the R base, here is another way using a positive statement (?=#) And an inanimate option (?U) .

 regmatches(x, regexpr('(?U)^.+(?=#)', x, perl=TRUE)) [1] "gfd gdr tsvfvetrv erv tevgergre "

+1

Matthew plourde Nov 28 '12 at 16:02

source share

Here's another approach that uses more string tools instead of a more complex regex. First, it finds the location of the first ##, and then extracts the substring to this point:

 library(stringr) x <- "gfd gdr tsvfvetrv erv tevgergre ## vev fe ## vgrrgf" loc <- str_locate(x, "##") str_sub(x, 1, loc[, "start"] - 1)

As a rule, I think this phased approach is more convenient than complex regular expressions.

+1

hadley Nov 28 '12 at 16:48

source share

Josh o'brien · Accepted Answer · 2012-11-28T15:56:41+0000

Just add one character by putting ? after the first quantifier, to make it "inanimate":

 gsub("(.*?)(##.*)", "\\1", x) # [1] "gfd gdr tsvfvetrv erv tevgergre "

Here's the relevant documentation, from ?regex

By default, repetition is greedy, so the maximum possible number of repetitions is used. This can be changed to "minimum" by adding '?' to the quantifier.

Grab from start to first occurrence of character with gsub

More articles: