What can prevent the switching of HTML encoding from UTF-8 to UTF-16?

What are the implications of changing from UTF-8 to UTF-16 for HTML coding? I would like to know your thoughts on this matter. Are there any things I need to think about before making such changes?

Note: Interest in the huge amount of Japanese and Chinese text that I need to process.

+5
html encoding utf-8 utf-16
source share
6 answers

I can think of a few things that would go wrong:

  • You MUST indicate that this is UTF-16 in the HTTP header. Unlike UTF-8, UTF-16 does not support ASCII, which means that everything should be in UTF-16 from the very beginning.
  • Older clients do not support UTF-16. For example, something in Windows 9x. Perhaps Mac OS9 as well.
  • Oh wait, I almost forgot: North America and European copies of Windows XP do not have Asian fonts installed by default.
+8
source share
  • Your bandwidth consumption is likely to nearly double if most of your HTML is ASCII
  • Clients that incorrectly assume UTF-8 (or ASCII) will be confused

Why do you want to upgrade to UTF-16?

+7
source share

There is also a byte order, which becomes a problem with anything above 8-bit data. UTF-encoded files begin with a byte order character, which is used to determine the byte order or judgment of this file.

Wikipedia has a pretty good explanation for this.

+2
source share

As far as I know, all modern browsers support UTF-16 encoding. But, as others have pointed out, you must explicitly declare the encoding. Not all browsers and platforms will support all Unicode characters, but I think that this is regardless of the encoding you use.

However, if bandwidth is a big issue, you should probably consider gzipping HTML. This will save a lot more bandwidth than switching encoding.

+2
source share

A very good article you have here. Basics: โ€œWhen a unique character encoding is required, the character encoding MUST be UTF-8, UTF-16 or UTF-32. US-ASCII is compatible with UTF-8 (US-ASCII string is also UTF -8, see [RFC 3629] ), so UTF-8 is suitable for US-ASCII compatibility. " In practice, US-ASCII compatibility is so useful that it is almost a requirement. The W3C wisely explains: "In other situations, such as APIs, UTF-16 or UTF-32 may be more appropriate. Possible reasons for choosing one include internal processing efficiency and compatibility with other processes."

+2
source share

I suspect most browsers won't even show your pages.

-6
source share

All Articles