What does # (\ w +) = ([\ '"]) (. *) \\ 2 # U mean?

Question

What does # (\ w +) = ([\ '"]) (. *) \\ 2 # U mean?

I will add to the regex.

I hope someone can explain what happens in # (\ w +) = ([\ '"]) (. *) \ 2 # U.

preg_match_all('#(\w+)=([\'"])(.*)\\2#U', $str, $matches);

Thanks in advance.

+4

regex

shin Dec 13 '10 at 21:30

source share

4 answers

You match the lines of the form:

Foo = 'bar'

or

Baz = "blat"

(\ w +) matches one or more characters of a word. (Word characters are from a to z, from A to Z, and underscores.)

= matches the literal equal sign.

[\ '"] matches one or two quotation marks.

(. *) matches any sequence of characters zero or more times.

\ 2 is shielded \ 2, which in regexp matches the second match. In this case, your second match is either a single or double quote. Using \ 2 ensures that quotation marks match, and you can use a different quote style in the string.

+2

James kovacs Dec 13 '10 at 21:48

source share

I am going to break it into pieces.

This opens the regex string.

  (\w+)

This matches and captures 1 or more characters of the word, which corresponds to upper case and lower case, as well as underscore. He stops when he sees the = sign.

Corresponds to literal =.

  ([\'"])

Matches the symbol "symbol" or "symbol".

  (.*)\\2

Matches and captures any character except \ n until the last occurrence of "\ 2", "\" matches only 1 \.

This closes the regex line.

This is the argument of the match string, which, it seems to me, matches it as a UTF-8 string. I'm not very good at how PHP deals with this; so there might be someone else.

This will capture 3 matches and put them in an array called $ matches in the sequence in which they were matched.

This should help you.

+1

Weegee Dec 13 '10 at 21:50

source share

This regular expression is probably used to get name / value pairs that have a name consisting of one or more word characters ( \w+ ), followed by an equal sign ( = ), followed by a quoted string that is either wrapped in single or double quotation marks ( ['"] ). The rest ( (.*)\2 ) is simply used to get everything between quotes, while \2 ensures that the corresponding quotation mark matches (the same as in the second subpattern ) . As used U-modifier , all quantifiers are reluctant they correspond just as little as possible.

+1

Gumbo Dec 13 '10 at 21:50

source share

Antal spector-zabusky · Accepted Answer · 2010-12-13T21:50:58+0000

Let me break it in pieces. To begin with, note that preg_match_all accepts delimiters around its regular expression, so # doesn’t match anything, but U : it is a modifier that makes the match “jagged” . This means that instead of harmonizing as much as possible, all quantifiers ( ? , * , + , {,} ) Will correspond as little as possible. Then, in parts:

(\w+) : \w matches the word character - something alphanumeric or underline; + corresponds to one or more of them; and the brackets group it and store it in the first capture group, which can be accessed with \1 .
= : match literal = . Very simple:)
([\'"]) : the square brackets introduce a character class, which is an abbreviated way of saying“ match any of these characters. ”Here the character class is ['"] , but since it is a single-quoted string, you must avoid t215>. Thus, this matches either ' or " , and saves the result in the second capture group, which can be accessed with \2 This is the only matching capture group for this particular regular expression.
(.*) :. matches any character without a newline, and * matches any number (zero or more). This is why the U modifier is important! Without it, this will always match up to the end of the line; with him he will match until the next match. Note that since it is in parentheses, it is in the third capture group, which can be accessed with \3 (shocking).
\\2 : If we didn’t have to hide the backslash, it would just be \2 : the contents of the second capture group. In this case, this is some kind of quote that we compared in step 3.

Putting it all together, this regular expression matches roughly the name of the variable (step 1), followed by an equal sign (step 2), followed by a line (steps 3-5); the reason for \2 is that the regular expression will not match "string' , and the reason for the modifier U is that foo="string" bar="strung" will return two matches foo="string" and bar="strung" ( with \1 foo and bar , while \3 is string and strung ), rather than a single, greedy match foo="string" bar="strung" (with \1 foo and \3 string" bar="strung ). Some examples :

 foo_bar_123="John applesauce." 100='seventeen' banana_split="" _="This is a normal string"

These objects can be scattered throughout the line, on the same line or on different lines, inside the surrounding text or not, as long as each object on its own is on the same line. Please note that spaces are not allowed, so foo = "bar" will not match.

What does # (\ w +) = ([\ '"]) (. *) \\ 2 # U mean?

More articles: