Match UTF characters with preg_match in PHP: (* UTF8) Works on Windows, but not Linux

I have a simple regex to validate a username:

preg_match('/(*UTF8)^[[:alnum:]]([[:alnum:]]|[ _.-])+$/i', $username); 

In local testing (Windows 7 using WAMP) this will allow usernames to be used using UTF characters (e.g. é or ñ). However, when I go over to check this on the server where the site will be hosted, I get the following warning:

Warning: preg_match () [function.preg-match]: Compilation error: (* VERB) is not recognized at offset 5 in /home/sites/vgmusic.com/test/Core/Impl/FormElementValidator.php on line 12

I also tried this on a local Ubuntu installation and got the same error. In fact, I only saw this work in my local development environment. Is there a way to allow special characters that will work for all operating systems?

+6
php regex unicode
source share
3 answers

Try describing the characters using the Unicode character properties :

 preg_match('/^\p{L}[\p{L} _.-]+$/u', $username) 
+14
source share

I already tried with the specified /u option. On Windows (PHP 5.2.16), adding the /u option worked fine for writing a line containing Unicode characters, however on CentOS 5 and PHP 5.2.16 I still couldn’t write a line containing Unicode characters using .* (Preg_match basically failed to capture).

After a long time, I’m not going anywhere, messing around with the "LOCALE" settings, which did not change anything, I finally found this site .

I made rpm -Uvh appropriate version of rpm, restarted apache, and all of a sudden my regular expressions worked fine!

Although I had UTF-8 support initially, my regular expressions did not commit unicode strings until I installed the updated rpm, which also adds “Unicode Property Support”. I thought UTF-8 support would be enough, but apparently not.

+2
source share

this seems like an old post, but since it is always a subject of interest, I will post what I found here . This is a small difference, but makes the code simpler. The fact is that curly braces are optional .

The above Gumbo and Scott code can be written more simply, as if someone wanted to allow only letters (Unicode and not Unicode) and spaces:

 preg_match("/^\pL[\pL ]+$/u",$string) 

I also noticed that preg_match accepts even simpler code:

 preg_match("/^[\pL ]+$/u",$string) 
+1
source share

All Articles