Removing everything after a character in a column in R

I need to delete everything after the question mark in the column.

I have an EX dataset:

my.data BABY MOM LANDING mark dina www.example.com/?kdvhzkajvkadjf tom becky www.example.com/?ghkadkho[qeu brad tina www.example.com/?klsdfngal;j 

I want my new data to be:

 new.data BABY MOM LANDING mark dina www.example.com/? tom becky www.example.com/? brad tina www.example.com/? 

How to tell R to delete everything after ? in my.data$LANDING ?

+5
source share
1 answer

Can we use sub to remove characters after ? . We use positive lookbehind ( (?<=\\?).* ) To match one or more characters ( . ) Preceded by ? and replace it with. ''

  my.data$LANDING <- sub('(?<=\\?).*$', '', my.data$LANDING, perl=TRUE) my.data # BABY MOM LANDING #1 mark dina www.example.com/? #2 tom becky www.example.com/? #3 brad tina www.example.com/? 

Or another option is to use capture groups , and then replace the second argument with the capture group ( \\1 ).

  my.data$LANDING <- sub('([^?]+\\?).*', '\\1', my.data$LANDING) 

Here we match all characters that are not ? ( [^?]+ ) and then ? ( \\? ) and use parentheses to write as a group ( ([^?]+\\?) ), and we leave the rest of the characters not in the group ( .* ).

Or how @Frank mentioned in the comments can we match the character ? and other characters ( .* ) and replace it with \\? as a second argument.

  my.data$LANDING <- sub("\\?.*","\\?",my.data$LANDING) 
+10
source

All Articles