Take a marker with a token:
/ begin regex literal< match literal <(\/?) matches 0 or 1 ( ? ) literal / , which is escaped \(\w+) matches one or more word characters([^>]*?) lazily * matches zero or more ( *? ) of everything that is not >> matches literal >/ end regex literal
lazily * - adding "?" after the repetition quantifier makes it lazy, which means that the regular expression will match the previous character of the minimum number of times. See the documentation.
So essentially this regular expression will match β<β, followed by β/β, followed by any number of letters, numbers or underscores, followed by anything that is not β>β, and finally a ">".
Moreover, the token (\w+) not redundant, since it ensures the presence of at least one character of the word between < and > .
Remember that trying to parse HTML with regular expressions is usually a bad idea .
jbabey
source share