Remove characters preceding the first capital letter in a string in R

I am trying to remove all characters preceding the first capital letter instance for each line in a line vector:

x <- c(" its client Auto Group",  "itself and Phone Company", ", client Large Bank")

I tried:

sub('.*?[A-Z]', '', x) 

But this returns:

"uto Group"  "hone Company"   "arge Bank"

I need to return it:

"Auto Group"    "Phone Company" "Large Bank"

Any ideas?

Thank.

+4
source share
1 answer

You need to use a capture group with a backlink:

sub("^.*?([A-Z])", "\\1", x)

Here

  • ^ - beginning of line
  • .*? - any 0+ characters as small as possible
  • ([A-Z])- capture group 1, capturing the capital letter ASCII, which will be referenced \1in the replacement template.

So, we are restoring what we recorded as a result with reverse replication.

+4
source

All Articles