Why are there duplicate characters in Unicode?

I see several duplicate characters in Unicode . For example, the character ā€œCā€ can be represented by code points U + 0043 and U + 0421. Why is this so?

+5
source share
5 answers

As others have noted, your main mistake here is to confuse Latin and Cyrillic scripts and some glyphs (namely C ( U + 0043 LATIN CAPITAL LETTER C ) and C ( U + 0421 CYRILLIC CAPITAL LETTER ES )). There are many pairs of characters that are similar to each other, but are different characters. For example, you will find many Latin, Greek and Cyrillic. However, most of the time they only work in upper or lower case.

However, in reality they are duplicated, sometimes deliberately. For example, the Latin alphabet of everything (ASCII) is represented twice in the Unicode block "Halfwidth and Fullwidth Forms" between U + FF00 and U + FFEF. However, there are other similar examples, especially in the section of the mathematical alphabet on plane 1, where there are three or four Latin alphabets.

, , . , μ (U + 00B5 MICRO SIGN) μ (U + 03BC GREEK SMALL LETTER MU). decomposition.

code point. script . , ( ). , ​​ ( ). Unicode Transformation Formats.

?

:

  • . , , , .
  • script, , , , .

, . . , , "", "T". , , : "" "". .

+20

, . U + 0043 - C, U + 0421 ( S ).

- , - , .

+8

, 0 O ( ), - - .

+7

U+0043 - C, U+0421 - , , .

+2

All Articles