Language codes for Simplified Chinese and Traditional Chinese?

We create multilingual sub-sites on our website.

I would like to use 2 letter language codes. Spanish and French are easy. They will get URLs, for example:

mydomain.com/es mydomain.com/fr 

but I ran into the problem of traditional and simplified Chinese. Are there standards for which 2 alphanumeric codes are used for these languages?

 mydomain.com/zh mydomain.com/? 
+50
internationalization utf-8 cjk chinese-locale
Feb 03 '11 at 10:16
source share
3 answers

@dkarp gives an excellent general answer. I will add some additional features of the Chinese language:

There are several countries where Chinese is the main written language. The main difference between them is whether they use simplified or traditional characters, but there are minor regional differences (in the dictionary, etc.). The standard way to distinguish them can be a country code, for example. zh_CN for mainland China, zh_SG for Singapore, zh_TW for Taiwan or zh_HK for Hong Kong.

Mainland China and Singapore use simplified characters, while others use traditional characters. Since China and Taiwan are the two largest populations, simply zh_CN and zh_TW are often used to refer to simplified and traditional versions of website symbols.

It would be more correct to use zh_HANS for (common) simplified Chinese characters and zh_HANT for traditional Chinese characters, except in rare cases where it makes sense to distinguish between different countries.

+104
Feb 04 '11 at 4:40
source share

For this, a really standard view. Since people are faced with the same problem that you see - the same language, but with different dialects or symbols - they have expanded the two-letter language code with the two-letter region code. Thus, you may have a universal French page mydomain.com/fr , but internationalization for French Canadian readers can leave you with mydomain.com/fr_CA (Canada) and mydomain.com/fr_FR (France). Some platforms use dashes instead of underscores to separate language and region code (hence fr-CA and fr-FR ).

The standard language for simplified Chinese is zh_CN . The standard language for traditional Chinese is zh_TW .

I hesitate to point you to the actual BCP 47 standard documents, as they are a little tedious in details and a little light on readability. Just go with the standard locale identifiers, such as those used in Java , and everything will be fine.

+25
Feb 04 '11 at 4:01
source share

Language depends on where it is said (doh!), Therefore, language and language codes reflect this reality. zh is the basic language code, but since there are two main forms of it, zh_Hans and zh_Hant , but they are still language codes, not locales.

Location specific

To fully indicate which language is used in a particular place, the country code should still be a suffix, so make zh_Hans_HK and zh_Hant_HK for simplified and traditional Chinese, respectively, as in Hong Kong.

In fact, the reality is that many countries often require something more specific than a country code, but this is likely to lead to an exponential increase in the complexity and maintenance of databases such as CLDR, plus the support infrastructure for submitting it, such as IP extraction location information is not public or accurate enough.

Fixed text

Now, if the code is just to indicate which set of fixed lines to use in the user interface or even entire pages on the site, the country suffix is ​​really not needed if there are no more than a few places where the language varies significantly (location information) to create a whole separate set of resources.

The larger the set of resources, the more likely it is that a language code based on the language version [in this context, only a language attribute, not a true locale, so you can call it what you like!] Will be required, but at least you need do this only if necessary.

Values ​​on the fly

However, if you want to format specific variable values, such as dates, times, currencies and numbers, on the fly, the locales become important because all tools that support this functionality (for example, based on Unicode CLDR data) expect them. The language standard for them should be a separate parameter for the code for which a custom language created by its own language is set for use if you do not want to create a set of resources for each well-known language standard and support them ad nauseum!

Browser Language Tools

Please note that when specifying a language for a web page that can be edited, as in the input field, as well as spell checking in attributes or css enabled for the field, browser language tools will check the spelling field in accordance with this language.

Criteria

You should clearly understand what a set of resources provides, so consider:

  • Fixed lines? Only in language.
  • Formatting on the fly? Locale.
  • Spellcheck in viewer? Locale.
  • Entire pages / sub sites? Only the language, otherwise locale (as a language option) if significantly different content is required.

Table to minimize maintenance costs

I use a spreadsheet to store user interface strings, where each language code has a parent code, so the cell for its version of the string has a formula that gets its string from the parent. To create a custom string for this language and string, I just rewrite the cell formula with the exact text. This minimizes the amount of service resources. I run a macro at the end that generates a complete resource file for each language.

0
Jan 07 '17 at 4:09 on
source share



All Articles