Regular expression for marking leading characters to the first digit encountered

I have a line called thisLine and I would like to remove all characters before the first integer. I can use the command

regexpr("[0123456789]",thisLine)[1] 

to determine the position of the first integer. How to use this index to split a string?

+7
source share
3 answers

Short answer:

 sub('^\\D*', '', thisLine) 

Where

  • ^ matches start of line
  • \\D matches any non-digit (this is the opposite of \\D )
  • \\D* tries to match as many consecutive non-digital digits as possible
+11
source

You need the substring function.

Or use gsub to work in a single snapshot:

 > gsub('^[^[:digit:]]*[[:digit:]]', '', 'abc1def') [1] "def" 

You can include this first digit, which can be done using capture:

 > gsub('^[^[:digit:]]*([[:digit:]])', '\\1', 'abc1def') [1] "1def" 

Or, as the flopel and Alan point out, just replace β€œall leading numbers” with a space. See answer flodel.

+6
source

My personal preferences, skipping regexp in general:

 sub("^.*?(\\d)","\\1",thisLine) #breaking down the regex #^ beginning of line #. any character #* repeated any number of times (including 0) #? minimal qualifier (match the fewest characters possible with *) #() groups the digit #\\d digit #\\1 backreference to first captured group (the digit) 
+6
source

All Articles