UTF-8 is a Unicode encoding, a way of representing a (abstract) sequence of Unicode characters as a (specific) sequence of bytes. There are other encodings, such as UTF-16 (which have both high and low order options). Both UTF-8 and UTF-16 can represent any character in Unicode, so you can support all languages, regardless of which one you choose.
UTF-8 is useful if most of your text is in Western languages, since it represents ASCII characters in just one byte, but for many characters, for a character of a foreign alphabet such as Chinese, three bytes are required for each character. UTF-16, on the other hand, uses exactly two bytes for all the characters you are likely to encounter (although some very esoteric characters outside of the Unicode "Basic Multilingual Plane" require four).
I would not recommend using PHP to develop international software, because it really does not support Unicode. It has some additional functions for working with Unicode encodings (look at a multibyte string ), but the PHP core treats strings as bytes, not characters, so standard PHP string functions are not suitable for working with characters that are encoded as more than one byte . For example, if you call PHP strlen() on a string containing the UTF-8 representation of the "大" character, it will return 3 because this character takes up three bytes in UTF-8, although this is only one character. Using line break functions such as substr() is unstable because if you split the middle of a multibyte character, you will damage the string.
Most of the other languages used for web development, such as Java, C # and Python, have built-in Unicode support, so you can put arbitrary Unicode characters in a string and do not have to worry about what encoding is used to represent them in memory, because that from your point of view, the string contains characters, not bytes. This is a much safer, less error prone way to work with Unicode text. For this and other reasons (PHP is actually not such a wonderful language), I would recommend using something else.
(I read that PHP 6 will have proper Unicode support, but this is not yet available.)
Wyzard
source share