Remove anything in a pair of parentheses using gsub in R

Suppose I have a line as shown below:

<a>b<c>

I want to delete both <a>, and so <c>, but I cannot use gsub("<.*>","","<a>b<c>"), as this will also delete b.

I asked a similar question before, but when I think it over, I think I should learn in general how to deal with such problems. Thank.

+5
source share
3 answers

Do not allow closing the brackets >in the material between the brackets:

z <- "<a>b<c>"
gsub("<[^>]+>","",z)
+11
source

You can use a non-greedy regular expression, for example. /<.*?>/.

HTML . HTML-, .

<span title="Help > Index">
+4

Another idea, often very useful in noisy settings (i.e. when it approaches the creation of a tokenizer):

strsplit("<a>b<c>",split='<|>')[[1]][3]
+4
source

All Articles