I have a specific regex:
#\b[a-z0-9-_%"]+\b#gi
I have the following test line. I apply this regular expression filter to:
abc def ghi jkl mno %%car% __car_ tall-person "thing" 20% %30%
However, the boundaries of the detected words are as follows (square brackets represent the borders):
[abc] [def] [ghi] [jkl] [mno] %%[car%] [__car_] [tall-person] "[thing"] [20%] %[30%]
Thus, certain types of punctuation ("_") are recognized at the beginning and at the end of the word as "word characters". On the other hand, other types ("%" or "double quotes") are ignored when they are at the beginning of the word. Why is this?
\w ( ): [A-Za-z0-9_]; %, " , : .
\w
[A-Za-z0-9_]
%
"
, :
// javascript example > 'abc def ghi jkl mno %%car% __car_ tall-person "thing" 20% %30%'.match(/[a-z0-9-_%"]+/g) ["abc", "def", "ghi", "jkl", "mno", "%%car%", "__car_", "tall-person", ""thing"", "20%", "%30%"]