Remove urls from strings

I have the following line stored in the sentence object:

 sentence <- "aazdlubtirol: RT @tradeDayTrades: sister articles \"$AAPL Dancing in a Burning Room\" January 2013 http://t.co/tkuCRfLy \" $AAPL vs $AAPL \" August 2011 http://t.co/863HkVjn" 

I am trying to use gsub to remove urls starting with http :

sentence <- gsub('http.*','',sentence)

However, it replaces everything after http :

aazdlubtirol: RT @tradeDayTrades: sister articles \"$AAPL Dancing in a Burning Room\" January 2013

I want to:

aazdlubtirol: RT @tradeDayTrades: sister articles \"$AAPL Dancing in a Burning Room\" January 2013 \" $AAPL vs $AAPL \" August 2011

I am trying to clear the urls, so if the string contains http , I want to remove the url. I found some solutions, but they do not help me.

+6
source share
1 answer

Add a space to your replaced group:

 gsub('http.* *', '', sentence) 

Or using \\s , which is a regular expression for a space:

 gsub('http.*\\s*', '', sentence) 

According to the comment .* Will match anything, and regular expressions will be greedy. Instead, we must match one or more characters without spaces any number of times, followed by zero or more spaces:

 gsub('http\\S+\\s*', '', sentence) 
+8
source

All Articles