What is the difference between these RegEx

  • (\d+|) vs (\d+)?
  • [\w\W] vs [\d\D] vs.

Is there any difference between these regular regular expressions? Which one should you choose?

I am using Javascript.

+1
regex
01 Feb '13 at 14:43
source share
3 answers

[\w\W] and [\d\D] are used in languages ​​such as JavaScript, which do not have a child option. It matches all characters, including newlines, as opposed to . that matches all but the newline.

  \w\W or \d\D -> matches everything including newline characters . -> matches everything except newline characters unless 's' (dotall modifier) is specified (\d+|) or (\d+)? -> matches 1 or more digits OR any position (null) It could simply be written as '(\d*)' 
+5
Feb 01 '13 at 14:46
source share

The second is interesting, and I would like to say something about it:

  • [\w\W] and [\d\D] equivalent, and they are also equivalent to [\s\S] . \W is the complement character set for \W , and the same applies for the pair \D - \D and the pair \S - \S Therefore, when combined, they will match any character without exception.

    They are commonly used when there is no construct to "match any character without exception." One example of this is JavaScript. There is also a less well-known and very confusing construction for this in JavaScript [^] , which is usually not valid in other variants.

  • Dot . usually matches any character, but a new line \n . Depending on the language, this may exclude more characters.

    For Java, it excludes \n , \r , \u0085 , \u2028 and \u2029 . In this way . equivalent to [^\n\r\u0085\u2028\u2029]

    For JavaScript , period . will exclude \r , \u2028 and \u2029 in addition to \n . So,. equivalent to [^\n\r\u2028\u2029]

    In some language, the mode will be set in which . matches any character without exception. It is called DOTALL mode in Java and Python mode, SingleLine in C # and Perl.

Behavior . varies from language to language. As a rule, everyone agrees that \n should be excluded in "normal" mode, but they may differ slightly in the choice of exception.

+3
Feb 01 '13 at 14:48
source share

You did not say which language you are using, so I'm going to suggest Perl.

  • (\d+|) equivalent to (\d*) . It matches a sequence of 0 or more digits and captures the result at $1 . (\d)? matches 0 or 1 value. If it matches a digit, it puts it at $1 ; otherwise $1 will be undef (you can rewrite it as (?:(\d)|) if you want to delete ? ).

  • [\w\W] and [\d\D] equivalent, matching any character. . the default is equivalent to [^\n] (matches any character, but a newline). If you really want to match any character, you should use . and specify the /s flag that does . match any character.

+2
Feb 01 '13 at 14:48
source share



All Articles