Using regmatches
and gregexpr
, you will get a list with hashtags on a tweet, assuming hastag is in # format, followed by any number of letters or numbers (I'm not familiar with twitter):
foo <- c("RddzAlejandra: RT @NiallOfficial: What a day for @johnJoeNevin ! Sooo proud t have been there to see him at #London2012 and here in mgar #MullingarShuffle","BPOInsight: RT @atos: Atos completes delivery of key IT systems for London 2012 Olympic Games http://t.co/Modkyo2R #london2012","BloombergWest: The #Olympics sets a ratings record for #NBC, with 219M viewers tuning in. http://t.co/scGzIXBp #london2012 #tech") regmatches(foo,gregexpr("#(\\d|\\w)+",foo))
Return:
[[1]] [1] "#London2012" "#MullingarShuffle" [[2]] [1] "#london2012" [[3]] [1] "#Olympics" "#NBC" "#london2012" "#tech"
source share