I am trying to find a quick way to get an array of strings made for each: 1- hashtag, 2- user mentions 3- URLs in the tweet text. I have tweet text in a CSV file.
My way to solve the problem takes too much time to process, and I wonder if I can optimize my code a bit. I will show my regex rules for each type of match, but just won't post the long code. I will only show how I match the hashtags. The same method is for URLs and mentions of users.
Here it is:
public static String hashtagRegex = "^#\\w+|\\s#\\w+";
public static Pattern hashtagPattern = Pattern.compile(hashtagRegex);
public static String urlRegex = "http+://[\\S]+|https+://[\\S]+";
public static Pattern urlPattern = Pattern.compile(urlRegex);
public static String mentionRegex = "^@\\w+|\\s@\\w+";
public static Pattern mentionPattern = Pattern.compile(mentionRegex);
public static String[] getHashtag(String text) {
String hashtags[];
matcher = hashtagPattern.matcher(tweet.getText());
if ( matcher.find() ) {
hashtags = new String[matcher.groupCount()];
for ( int i = 0; matcher.find(); i++ ) {
hashtags[i] = matcher.group().replace(" ", "").replace("#", "");
}
}
return hashtags;
}
source
share