A quick way to extract hashtags, user mentions, and URLs from tweet text?

I am trying to find a quick way to get an array of strings made for each: 1- hashtag, 2- user mentions 3- URLs in the tweet text. I have tweet text in a CSV file.

My way to solve the problem takes too much time to process, and I wonder if I can optimize my code a bit. I will show my regex rules for each type of match, but just won't post the long code. I will only show how I match the hashtags. The same method is for URLs and mentions of users.

Here it is:

public static String hashtagRegex = "^#\\w+|\\s#\\w+";
public static Pattern hashtagPattern = Pattern.compile(hashtagRegex);

public static String urlRegex = "http+://[\\S]+|https+://[\\S]+";
public static Pattern urlPattern = Pattern.compile(urlRegex);

public static String mentionRegex = "^@\\w+|\\s@\\w+";
public static Pattern mentionPattern = Pattern.compile(mentionRegex);

public static String[] getHashtag(String text) {
   String hashtags[];
   matcher = hashtagPattern.matcher(tweet.getText());

    if ( matcher.find() ) {
        hashtags = new String[matcher.groupCount()];
        for ( int i = 0; matcher.find(); i++ ) {
                    //Also i'm getting an ArrayIndexOutOfBoundsException
            hashtags[i] = matcher.group().replace(" ", "").replace("#", "");
        }
    }

   return hashtags;

}
+4
source share
1 answer

Matcher#groupCount , . ArrayIndexOutOfBoundsException ( ). List , .

() , , , http, @ #. , . ( , ).

+2

All Articles