Can you try this regex:
/(?:^|\s)(?:(?:
UPDATE 1:
There are several cases where the above regular expression will not match: # blah23blah and # 23blah23. Therefore, the regular expression is modified to take care of all cases.
Regex:
/(?:\s|^)(?:#(?!\d+(?:\s|$)))(\w+)(?=\s|$)/i
Breakdown:
(?:\s|^) - Specifies the previous space or the beginning of a line. Do not capture the match.# - Sets a hash, but is not fixed.(?!\d+(?:\s|$))) - non-negative Lookahead to avoid all numeric characters between # and space (or end of line)(\w+) - Captures and captures all characters of the word(?=\s|$) - Positive Lookahead to provide the next space or end of the line. This is necessary to ensure that it matches adjacent valid hash tags.
Example text modified to capture most cases:
#blah Pack your #box C # 5 with a dozen # good2 # 3good liquor. # jugs link.com/liquor#jugs # mkvef214asdwq sd # 3e4 flsd # 2good # first # second # 3
Matches:
Match 1: blah
Match 2: box
Match 3: good2
Match 4: 3good
Match 5: mkvef214asdwq
Match 6: 3e4
Match 7: 2years
Ruble link
UPDATE 2:
To exclude words beginning or ending with an underscore, simply include your exceptions in a negative way:
/(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s|$)))(\w+)(?=\s|$)/i
Pattern, regex and matches are written in Rubular link
source share