What are the spaces \ s matched in PHP?

What is the complete list of characters matched by the \ s escape sequence in PHP? Some regular expressions include vertical space and other characters in this escape sequence.

+6
php regex
source share
3 answers

From the pcrepattern specification page:

Common character types

\s any white space character 

For compatibility with Perl, \ s was not used to match the VT character (code 11), which made it different from the POSIX space class. However, Perl added VT in version 5.18, and PCRE followed the release 8.34 example. The default characters \ s are now HT (9), LF (10), VT (11), FF (12), CR (13), and space (32), which are defined as white spaces in the C locale. This list may differ depending on the locale matching. For example, in some, the "Inextricable Space" character (\ xA0) is recognized as a space, and in others, the VT character.

So \s will match 5 characters plus more depending on:

  • PCRE library version
  • Configure Locales

This test compares the result of preg_match with various versions of PHP.

+3
source share

PHP has \h only for horizontal space characters: http://www.php.net/manual/en/regexp.reference.escape.php

According to http://www.pcre.org/pcre.txt :

For compatibility with Perl, \ s does not match the VT character (code 11). This makes it different from the POSIX “space” class. \ S characters are HT (9), LF (10), FF (12), CR (13), and space (32). If "use the locale"; included in the Perl script, \ s can match the VT character character. In PCRE, he never does.

So, if "Vertical Space" refers to a vertical tab, the answer will not be.

  The sequences \ h, \ H, \ v, and \ V are features that were added to Perl
 at release 5.10.  In contrast to the other sequences, which match only
 ASCII characters by default, these always match certain high-valued
 codepoints in UTF-8 mode, whether or not PCRE_UCP is set.

 The horizontal space characters are:

          U + 0009 Horizontal tab
          U + 0020 Space
          U + 00A0 Non-break space
          U + 1680 Ogham space mark
          U + 180E Mongolian vowel separator
          U + 2000 En quad
          U + 2001 Em quad
          U + 2002 En space
          U + 2003 Em space
          U + 2004 Three-per-em space
          U + 2005 Four-per-em space
          U + 2006 Six-per-em space
          U + 2007 Figure space
          U + 2008 punctuation space
          U + 2009 Thin space
          U + 200A Hair space
          U + 202F Narrow no-break space
          U + 205F Medium mathematical space
          U + 3000 Ideographic space

 The vertical space characters are:

          U + 000A Linefeed
          U + 000B Vertical tab
          U + 000C Formfeed
          U + 000D Carriage return
          U + 0085 Next line
          U + 2028 Line separator
          U + 2029 Paragraph separator
+3
source share

From http://www.pcre.org/pcre.txt :

\ s any character that \ p {Z} matches, plus HT, LF, FF, CR

+1
source share

All Articles