How many characters are visible as a space, but arent spaces?

If I want to open the hexadecimal equivalent of space in PHP, I can play with bin2hex :

 php > echo var_dump(bin2hex(" ")); string(2) "20" 

I can also get the space character from "20"

 php > echo var_dump(hex2bin("20")); string(1) " " 

But there are Unicode versions for the "visible" space:

 php > echo var_dump(hex2bin('c2a0')); string(2) " " 

So, I can get some line (for example, from HTTP requests), where I can not recognize "without spaces" with my eyes. So,...

 $string = preg_replace('~\x{00a0}~siu', ' ', $string); 

Is there a better way to find and replace all โ€œlikeโ€ characters with PHP?

+7
php regex
source share
1 answer

You can use the Unicode \p{Zs} category :

Zs space separator

 $string = preg_replace('~\p{Zs}~u', ' ', $string); 

A class of the \p{Zs} Unicode category will correspond to these space-like characters :

 Character Name U+0020 SPACE U+00A0 NO-BREAK SPACE U+1680 OGHAM SPACE MARK U+2000 EN QUAD U+2001 EM QUAD U+2002 EN SPACE U+2003 EM SPACE U+2004 THREE-PER-EM SPACE U+2005 FOUR-PER-EM SPACE U+2006 SIX-PER-EM SPACE U+2007 FIGURE SPACE U+2008 PUNCTUATION SPACE U+2009 THIN SPACE U+200A HAIR SPACE U+202F NARROW NO-BREAK SPACE U+205F MEDIUM MATHEMATICAL SPACE U+3000 IDEOGRAPHIC SPACE 
+5
source share

All Articles