What is the minimum Unicode character set for reasonable Japanese support?

I have a mobile application that needs to be ported to a Japanese audience. Part of the application is a custom font file that should be expanded only from characters of the Latin alphabet 1, which also contains Japanese characters. I understand that this will make it pretty big, but this is not today's problem.

Please note: I can not control the text that this application will display, so it should be able to support enough to display user-generated content.

This is what I consider to be the maximum set of Unicode ranges that will cover everything that is required of it.

Compatability U+3300 - U+33FF Compatability forms U+FE30 - U+FE4F Compatability ideographs U+F900 - U+FAFF Compatability ideographs supplement U+2F800 - U+2FA1F Radicals supplement U+2E80 - U+2EFF Strokes U+31C0 - U+31EF Symbols and punctuation U+3000 - U+303F Unified Ideographs U+4E00 - U+9FBB Unified Ideographs ext. A U+3400 - U+4DB5 Unified Ideographs ext. B U+20000 - U+2A6D6 Enclosed letters and months U+3200 - U+32FF Hiragana U+3040 - U+309F Kanbun U+3190 - U+319F Katakana U+30A0 - U+30FF Katakana phonetic U+31F0 - U+31FF 

I need to know:

  • Is something missing on this list?
  • Is something clearly not required?
  • Is something inconsequential and why can it be affirmed as such?
+7
text fonts unicode internationalization
source share
2 answers

Summary of Key Characters

 Enclosed Alphanumerics U + 2460 - U + 2473
             "U + 2474 - U + 24E9 *
             "U + 24EA - U + 24FF
 Miscellaneous Symbols U + 2600 - U + 2607
             "U + 2618 - U + 2618
             "U + 260E - U + 260F
             "U + 2614 - U + 2615
             "U + 263D - U + 2653
             "U + 2660 - U + 266F
 Symbols and punctuation U + 3000 - U + 303F
 Hiragana U + 3040 - U + 309F
 Katakana U + 30A0 - U + 30FF
 Katakana phonetic U + 31F0 - U + 31FF
 Enclosed letters and months U + 321F - U + 325F *
             "U + 3280 - U + 32FF *
 Unified Ideographs ext.  A U + 3400 - U + 4DB5
 Unified Ideographs U + 4E00 - U + 9FBB
 Compatability ideographs U + F900 - U + FAFF
 Compatibility forms U + FE30 - U + FE4F
 Full-Width Roman U + FF00 - U + FF5E
 Half-Width Katakana U + FF61 - U + FF9F
 Full- and Half-Width Symbols U + FFE0 - U + FFEE
 Unified Ideographs ext.  B U + 20,000 - U + 2A6D6
 Compatability ideographs supplement U + 2F800 - U + 2FA1F

 * = Lower priority

Full explanation

Do not forget about the full-sized novel, which is often used for the Japanese alphabet (FF00-FF5E) and half the width of the katakanana (FF61-FF9F). You will probably also need full and half-width characters (FFE0-FFEE).

It can be argued that the Kanbun annotation (3190-319F) is usually not used. Kanbun and the old Japanese style, which uses all Chinese characters (without Hiragana or Katakana) with a different set of grammar rules, usually taught at school. These annotation labels will not be used unless someone tries to explain how to read / understand one of these passages, which is probably unlikely. It may be included for completeness, but is probably not a high priority.

CJK compatibility (3300-33FF) is usually used by newspapers in the print media, but it will almost certainly not be used by the average audience (I have not seen it on the website). In any case, they have equivalent long forms (for example, ㌘ can be written as グ ラ ム), so this also belongs to the non-essential category.

The CJK Radicals (2E80-2EFF) application is also immaterial, but can be used. They are not complete characters, but the "radical" (basic part) of the characters. They can be used to explain the character’s conclusion, but are unlikely to be used in normal language use.

CJK Strokes (31C0-31E3) is the same as CJK Radicals and is likely to be even less likely to be used in everyday use.

The first part of the Enclosed CKJ Letters and Months (3200-321E) is not needed. These are Korean characters. Same thing with (3260-327F). The rest of the page has a low level of use, but I would include it for completeness, because someone would probably try to use it at times. But you can consider them lower priorities.

The rest that you called up on your initial list are important.

Also not listed Enclosed Alphanumerics (2460-24FF). Cyclic numbers (2460-2473 and 24EA-24FF) are used relatively frequently. However, the workaround, the number in brackets, and the period of numbers (2474-24E9) may be omitted as non-essential.

In addition, it would be useful for you to include various characters (2600-263C), although some of them are used more often than others. Absolutely important ones include some conditional weather symbols (2600-2607), trefoil (2618), telephones (260E-260F), an umbrella and a hot drink (2614-2615), astrology and zodiac symbols (263D-2653), and a card game, hot springs and musical symbols (2660-266F).

+13
source share

From a technical point of view, you should include: 1. Arabic numerals (0,1..9) 2. Punctuation marks (! "# $% '...) 3. Roman letters (A..Z, a..z ) (Half-Width and Full-Width)

1-3 basically means support for ASCII.

  • hiragana
  • katakana
  • Japanese punctuation
  • Joyo Kanji (This is a list of about 2,000 kanji approved by the Japanese government for use in newspapers, etc.).
  • Name Kanji (Another list compiled by the Japanese government for proper names).

All together, this will give you 2,600 kanji or something like that, and you can imagine most of the normal things you can find on the Internet. There are some minor exceptions when characters are common, but not in Joyo (fe 沢).

The problem is that Unicode is not exactly organized around the Joyo kanji list, so you have to select and select within ranges. Probably the easiest way is to add all the kanji that exist in Japanese, even if it is not often used or part of Joyo.

0
source share

All Articles