What is the full range for Chinese characters in Unicode?

U + 4E00..U + 9FFF is part of the set, but not all

+80
unicode cjk
Sep 02 '09 at 6:13
source share
6 answers

Maybe you can find the full list in the CJK Unicode FAQ (which includes the characters "Chinese, Japanese, and Korean")

The East Asian Scenario document mentions:

Blocks containing han ideograms

Khan's ideographic symbols are found in the five main blocks of the Unicode standard, as shown in Table 12-2.

Table 12-2. Blocks containing han ideograms

Block Range Comment CJK Unified Ideographs 4E00-9FFF Common CJK Unified Ideographs Extension A 3400-4DBF Rare CJK Unified Ideographs Extension B 20000-2A6DF Rare, historic CJK Unified Ideographs Extension C 2A700–2B73F Rare, historic CJK Unified Ideographs Extension D 2B740–2B81F Uncommon, some in current use CJK Unified Ideographs Extension E 2B820–2CEAF Rare, historic CJK Compatibility Ideographs F900-FAFF Duplicates, unifiable variants, corporate characters CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants 

Note: block ranges may change over time: the latter are in CJK's unified ideographies .

See also Wikipedia:

+92
Sep 02 '09 at 6:27
source share

Unicode currently has 74,605 ​​CJK characters. CJK symbols include not only the symbols used by the Chinese, but also Japanese kanji, Korean Hanja, and Vietnamese Chu Nom . Some CJK characters are not Chinese characters.

1) 20941 characters from the Block of Unified Ideographers CJK .

Code points U + 4E00 to U + 9FCC.

2) 6582 characters from CJKUI Ext A block .

Code points U + 3400 to U + 4DB5 . Unicode 3.0 (1999).

3) 42,711 characters from the CJKUI Ext B block .

Code points U + 20,000 to U + 2A6D6. Unicode 3.1 (2001).

3) 4149 characters from the CJKUI Ext C block .

Code points U + 2A700 to U + 2B734 . Unicode 5.2 (2009).

4) 222 characters from the CJKUI Ext D block .

Code points U + 2B740 to U + 2B81D . Unicode 6.0 (2010).

5) Block CJKUI Ext E.

Will be soon

If the above is not enough spaghetti, check out the known issues . Good luck =)

+45
Jul 10 2018-12-14T00:
source share

Exact ranges for Chinese characters (except extensions): [\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC\uF900-\uFAAD] .

  1. [\u2E80-\u2fd5]

The CJK Radical Supplement is a Unicode block containing alternative, often positional, forms of Kangxi radicals. They are used by headings in vocabulary indexes and other CJK ideographic collections organized by radical sweeping.

  1. [\u3190-\u319f]

Kanbun is a Unicode block containing annotation characters used in Japanese copies of classic Chinese texts to indicate reading order.

  1. [\u3400-\u4DBF]

CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideograms.

  1. [\u4E00-\u9FCC]

CJK Unified Ideographs is a Unicode block containing the most common CJK ideographies used in modern Chinese and Japanese.

  1. [\uF900-\uFAAD]

CJK compatibility ideographs are a Unicode block created to contain Han characters that have been encoded in several places in other installed character encodings, in addition to their CJK Unified Ideographs assignments, to maintain compatibility between Unicode and these encodings.

For more details, click here , and extensions are given in other answers.

+16
Dec 15 '16 at 2:19
source share

Unicode version 11.0.0

In Unicode, Chinese, Japanese, and Korean (CJK) scripts share a common framework known as CJK characters.

These ranges often contain unassigned or reserved code points (suck like U + 2E9A , U + 2EF4 - 2EFF)

Chinese characters

 bottom top reference(also have a look at wiki page) block name 4E00 9FEF http://www.unicode.org/charts/PDF/U4E00.pdf CJK Unified Ideographs 3400 4DBF http://www.unicode.org/charts/PDF/U3400.pdf CJK Unified Ideographs Extension A 20000 2A6DF http://www.unicode.org/charts/PDF/U20000.pdf CJK Unified Ideographs Extension B 2A700 2B73F http://www.unicode.org/charts/PDF/U2A700.pdf CJK Unified Ideographs Extension C 2B740 2B81F http://www.unicode.org/charts/PDF/U2B740.pdf CJK Unified Ideographs Extension D 2B820 2CEAF http://www.unicode.org/charts/PDF/U2B820.pdf CJK Unified Ideographs Extension E 2CEB0 2EBEF https://www.unicode.org/charts/PDF/U2CEB0.pdf CJK Unified Ideographs Extension F 3007 3007 https://zh.wiktionary.org/wiki/%E3%80%87 in block CJK Symbols and Punctuation 
  • In the CJK Unified Ideographs block, I noticed that many answers use the 9FCC upper bound, but U + 9FCD (鿍) is indeed a Chinese character. And all the characters in this block are Chinese (also used in Japanese, Korean, etc.).
  • Most of the characters in CJK Unified Ideograohs Ext (except Ext F, only 17% of Ext F are Chinese characters) are traditional Chinese characters that are rarely used in China.
  • Form is a Chinese hieroglyphic form of scratch that is still in use today

Therefore the range

[0x3007,0x3007], [0x3400,0x4DBF], [0x4E00,0x9FEF], [0x20000,0x2EBFF]

CJK characters but never used in Chinese

They are ordinary khans used only for compatibility.

They are almost impossible to see in any Chinese book, article, letter, etc.

all characters here have one corresponding glyph-identical Chinese character. Like 金 (U + F90A) and 金 (U + 91D1), they are the same in Glyph.

  F900 FAFF https://www.unicode.org/charts/PDF/UF900.pdf CJK Compatibility Ideographs 2F800 2FA1F https://www.unicode.org/charts/PDF/U2F800.pdf CJK Compatibility Ideographs Supplement 

CJK related characters

 2E80 2EFF http://www.unicode.org/charts/PDF/U2E80.pdf CJK Radicals Supplement 2F00 2FDF http://www.unicode.org/charts/PDF/U2F00.pdf Kangxi Radicals 2FF0 2FFF https://unicode.org/charts/PDF/U2FF0.pdf Ideographic Description Character 3000 303F https://www.unicode.org/charts/PDF/U3000.pdf CJK Symbols and Punctuation 3100 312f https://unicode.org/charts/PDF/U3100.pdf Bopomofo 31A0 31BF https://unicode.org/charts/PDF/U31A0.pdf Bopomofo Extended 31C0 31EF http://www.unicode.org/charts/PDF/U31C0.pdf CJK Strokes 3200 32FF https://unicode.org/charts/PDF/U3200.pdf Enclosed CJK Letters and Months 3300 33FF https://unicode.org/charts/PDF/U3300.pdf CJK Compatibility FE30 FE4F https://www.unicode.org/charts/PDF/UFE30.pdf CJK Compatibility Forms FF00 FFEF https://www.unicode.org/charts/PDF/UFF00.pdf Halfwidth and Fullwidth Forms 1F200 1F2FF https://www.unicode.org/charts/PDF/U1F200.pdf Enclosed Ideographic Supplement 
  • some blocks, such as the Hangul Compatibility Jamo, were abandoned due to a lack of relevance to the Chinese language.
  • Kangxi Radicals are not Chinese characters, they are a graphic component of Chinese characters, they are used specifically to express the radicals .eg ⼻ (U + 2F3B) and 彳 (U + 5F73), ⻜ (U + 2EDC) and 飞 (U +) 98DE)

Another common punctuation appears in Chinese.

This is a wide range, some punctuation marks may never be used, some punctuation marks, such as ……"" , are so often used in Chinese.

 0000 007F https://unicode.org/charts/PDF/U0000.pdf C0 Controls and Basic Latin 2000 206F https://unicode.org/charts/PDF/U2000.pdf General Punctuation …… 

There are also many Chinese-related symbols, such as the symbols of the hexagram Yijing or Kanbun, but this is off topic anyway. I am not writing Chinese characters in CJK to better explain what Chinese characters are. And the ranges above already cover almost all the characters in Chinese writing, except for mathematics and other special notation.

additional

CJK Symbols and Punctuation

  、。〃〄々〆〇〈〉《》「」『』【】〒〓〔〕〖〗〘〙〚〛〜〝〞〟〠〡〢〣〤〥〦〧〨〩〪〭〮〯〫〬〰〱〲〳〴〵〶〷〸〹〺〻〼〽 〾 〿 

Half Width and Full Width Forms

 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~⦅⦆。「」、・ヲァィゥェォャュョッーアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワン゙゚ᄀᄁᆪᄂᆬᆭᄃᄄᄅᆰᆱᆲᆳᆴᆵᄚᄆᄇᄈᄡᄉᄊᄋᄌᄍᄎᄏᄐᄑ하ᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵ¢£¬ ̄¦¥₩│←↑→↓■○ 

to handle

  1. https://zh.wikipedia.org/wiki/%E6%B1%89%E5%AD%97 (in Chinese, pay attention to the right side panel)
  2. https://zh.wikipedia.org/wiki/%E4%B8%AD%E6%97%A5%E9%9F%93%E7%9B%B8%E5%AE%B9%E8%A1%A8%E6 % 84% 8F% E6% 96% 87% E5% AD% 97 (pay attention to the bottom table)
  3. http://www.unicode.org
+3
Feb 18 '19 at 0:40
source share

Unicode codes block that the rest of the answers undoubtedly cover most Chinese Unicode characters, but also check some of these other blocks of code.

 CJK_UNIFIED_IDEOGRAPHS CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D CJK_UNIFIED_IDEOGRAPHS_EXTENSION_E CJK_COMPATIBILITY CJK_COMPATIBILITY_FORMS CJK_COMPATIBILITY_IDEOGRAPHS CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT CJK_RADICALS_SUPPLEMENT CJK_STROKES CJK_SYMBOLS_AND_PUNCTUATION ENCLOSED_CJK_LETTERS_AND_MONTHS ENCLOSED_IDEOGRAPHIC_SUPPLEMENT KANGXI_RADICALS IDEOGRAPHIC_DESCRIPTION_CHARACTERS 

See my more complete discussion here . And this site is convenient for viewing Unicode.

+1
Feb 01 '17 at 16:20
source share

To summarize, it sounds like this:

 var blocks = [ [0x3400, 0x4DB5], [0x4E00, 0x62FF], [0x6300, 0x77FF], [0x7800, 0x8CFF], [0x8D00, 0x9FCC], [0x2e80, 0x2fd5], [0x3190, 0x319f], [0x3400, 0x4DBF], [0x4E00, 0x9FCC], [0xF900, 0xFAAD], [0x20000, 0x215FF], [0x21600, 0x230FF], [0x23100, 0x245FF], [0x24600, 0x260FF], [0x26100, 0x275FF], [0x27600, 0x290FF], [0x29100, 0x2A6DF], [0x2A700, 0x2B734], [0x2B740, 0x2B81D] ] 
0
Jun 09 '19 at 22:45
source share



All Articles