Are string functions ASCII safe in PHP?

Some PHP string functions (e.g. strtoupper, etc.) are locale dependent. But it is still unclear if the locale is important when I really know that a particular string consists only of ASCII characters (0-127). Can I guarantee that strtoupper('abc..xyz') will always return ABC..XYZ regardless of the locale. Are PHP string functions working the same in the ASCII range regardless of locale?

While the answer about strtoupper important to me, the question is more general about all string function libraries.

I want to be sure that the language chosen by the user (on a multilingual site) does not violate my basic functionality, which has nothing to do with internationalization.

+7
source share
2 answers

Are PHP string functions supported in the ASCII range, regardless of language?

No, I'm afraid not. The first counterexample is the terrible Turkish dotted-I :

 setlocale(LC_CTYPE, "tr_TR"); echo strtoupper('hi!'); -> 'H\xDD!' ('HΔ°!' in ISO-8859-9) 

In the worst case scenario, you may have to provide your own language-independent string binding. Calling setlocale to return to C or some other locale is some kind of fix, but the local POSIX-level model is very poorly suited for modern client / server applications.

+7
source

PHP string functions process one byte as one character. In the ASCII range 0-127 this is normal.

To safely handle multiple languages ​​using UTF-8, use the mb_*() functions, the UTF-8 library, or wait until 2030 when PHP6 is released.

+4
source

All Articles