Unicode characters required for Japanese, Korean, and Chinese

I try to answer these basic questions without getting a degree in linguistics and the early history of mankind, which seems to lead to the search for all search queries.

  • What Unicode characters need to be included in the font in order to support Japanese text translation?

  • What Unicode characters need to be included in the font in order to support Chinese text translation?

  • What Unicode characters need to be included in the font in order to support Korean text translation?

+1
fonts unicode multilanguage cjk
Oct 28 '15 at 15:28
source share
3 answers

Start with East Asian scripts at @ unicode.org Code Charts .

For example, Hiragana has U + 3040 to U + 309F, and Katakana has U + 30A0 to U + 30FF.

+1
Oct 28 '15 at 17:47
source share

It depends on how much coverage you want to provide for each of these languages. The most commonly used characters in all of these languages ​​will require only a few thousand characters, but then from time to time you will come across some characters outside the cover. As the number of characters supported by your system increases, you are less likely to encounter these missing characters until you cover all of the CJK characters.

The general approach used by modern font developers to reduce the time and effort needed to create a font and still maintain a sufficient number of characters to display most fonts is to use ranges specified in pre-Unicode characters, such as Big5 (-HKSCS ), GB2312 or 18030 and, for example, is mentioned in a comment by others, but then it would be quite common to encounter characters that are not supported.

In Unicode, something called IICore was done and about ten thousand characters were defined, which would be the minimum necessary to support these languages, and in Unicode, the database also contains information on whether they are important for Chinese, Japanese, Korea, or such, however, no one currently uses them.

Google and Adobe now make Noto CJK or are known as Source Han fonts, which should cover as many CJK characters as possible. However, due to limitations in the file format, they can only place about 65,535 glyphs in a font and, therefore, have to add / remove characters in the process of creating them.

And finally, especially for Koreans, only Hangul / Jamo support is probably good enough in many cases, because Hanja (the symbol of the ideograph) is largely not used, except in a specialized field. Please note that the names of people and some words in the title may be part of these aspects that Hanja will still use, so they depend on whether they are important to you or not.

+1
Nov 13 '17 at 22:27
source share

You can approximate such lists by looking at the corresponding Unicode properties (in particular, the "Script" of each character), but this does not fully reflect the actual use of the characters.

The best indicator will be the character set that is already defined for the fonts for these languages ​​(e.g. Adobe-Japan-1-6 , Adobe-GB-1-5 and Adobe-Korea1-2 ) described in this technical note (the exact character sets are defined separately ). CMap files should allow you to translate them back to Unicode code points.

0
29 Oct '15 at 2:53 on
source share



All Articles