Is the value (\ /?) In the regular expression / is (\ w +) ([^>] *?) Redundancy?

Question

Is the value (\ /?) In the regular expression / is (\ w +) ([^>] *?) Redundancy?

this regular expression should match the html start tag, I think.

var results = html.match(/<(\/?)(\w+)([^>]*?)>/);

I see that he must first capture < , but then I am confused by what this capture does (\/?) . Am I reasoning correctly that ([^>]*?)> Searches for every character except>> = 0 times? If so, why do you need a capture (\w+) ? Is this not in scope [^>]*?

+7

javascript regex

1252748 Jul 03 '13 at 16:40

source share

5 answers

Using the power of debuggex to create an image :)

 <(\/?)(\w+)([^>]*?)>

Will be evaluated as follows

Change live in Debuggex

As you can see, it corresponds to HTML tags (opening and closing tags). The regular expression contains three capture groups that capture the following:

(\/?) existence of / (this is the closing tag, if present)
(\w+) tag name
([^>]*?) everything else until the tag is closed (for example, attributes)

So it matches <a href="#"> . Interestingly, it does not match <a data-fun="fun>nofun"> correctly, because it stops at > in the data-fun attribute. Although (I think) > valid in the value of the attribute .

Another funny thing: capturing name tags does not contain all theoretically valid XHTML tags. XHTML Lets Letter | Digit | '.' | '-' | '_' | ':' | .. Letter | Digit | '.' | '-' | '_' | ':' | .. Letter | Digit | '.' | '-' | '_' | ':' | .. (source: XHTML specification ). (\w+) , however, does not match . , - and : This imaginary <.foobar> tag will not match this regular expression. However, this should not have any real impact on life.

You see that parsing HTML using RgExes is risky. You might be better off with an HTML parser.

+4

tessi Jul 03 '13 at 17:04

source share

(\/?) matches and catches any closed tag, for example </i> , or </strong> if you are familiar with them?

One more note: \w really is a character class [a-zA-Z_\d] , so other characters, such as = , " , etc., do not match, and, nevertheless, will match [^>] And yes, you are right in that bit.

+3

Jerry Jul 03 '13 at 16:43

source share

To answer your last question, (\w+) and ([^>]*?) Are not redundant. Both of them perform important functions in the expression.

This expression finds start or end tags.

(\/?) matches a / , but ? makes it optional.

(\w+) matches the word characters intended to match the tag name here.

([^>]*?) intended to match attributes.

So, if you have the line <div class="text"> ,

(\w+) in the expression will match the div , and ([^>]*?) will match class="text"

+2

Jason p Jul 03 '13 at 16:45

source share

Demo (in ruby, not in javascript, but it doesn’t matter): http://www.rubular.com/r/bhw2O28qUr

To summarize, it captures the end tags.

0

Tom lord Jul 03 '13 at 16:46

source share

jbabey · Accepted Answer · 2013-07-03T16:44:54+0000

Take a marker with a token:

/ begin regex literal
< match literal <
(\/?) matches 0 or 1 ( ? ) literal / , which is escaped \
(\w+) matches one or more word characters
([^>]*?) lazily * matches zero or more ( *? ) of everything that is not >
> matches literal >
/ end regex literal

lazily * - adding "?" after the repetition quantifier makes it lazy, which means that the regular expression will match the previous character of the minimum number of times. See the documentation.

So essentially this regular expression will match “<”, followed by “/”, followed by any number of letters, numbers or underscores, followed by anything that is not “>”, and finally a ">".

Moreover, the token (\w+) not redundant, since it ensures the presence of at least one character of the word between < and > .

Remember that trying to parse HTML with regular expressions is usually a bad idea .

Is the value (\ /?) In the regular expression / is (\ w +) ([^>] *?) Redundancy?

More articles: