Why utf8 is compatible with ascii

Question

Why utf8 is compatible with ascii

A in UTF-8 - U+0041 LATIN CAPITAL LETTER A A in ASCII is 065 .

How is UTF-8 backwards compatible with ASCII?

+4

string utf-8 ascii

Isara Rungvitayakul Apr 12 '13 at 7:47

source share

3 answers

Why:

Since everything was already in ASCII and has a Unicode compatible Unicode format, adoption has become much easier. It is much easier to convert a program to use UTF-8 than to UTF-16, and this program inherits backward compatibility while continuing to work with ASCII.

how

ASCII is 7-bit encoding, but it is always stored in 8-bit bytes. This means that 1 bit has never been used.

UTF-8 simply uses this extra bit to indicate non-ASCII characters.

+2

Pubby Apr 12 '13 at 7:53

source share

Unicode is backward compatible with ASCII because ASCII is a subset of Unicode. Unicode simply uses all character codes in ASCII and adds more.

Although character codes are usually written as 0041 in Unicode, character codes are numeric, so 0041 has the same meaning as hexadecimal number 41.

UTF-8 is not a character set, but an encoding used in Unicode. This is also compatible with ASCII because the codes used for multi-byte encoding lie in the portion of the unused ASCII character set.

Note that only 7-bit Unicode and UTF-8 compatible ASCII character sets, 8-bit ASCII character sets, such as the IBM850 and Windows-1250, use the part of the character set where UTF -8 has codes for encoding with a few bytes.

+2

Guffa Apr 12 '13 at 7:55

source share

deceze · Accepted Answer · 2013-04-12T07:52:27+0000

ASCII uses only the first 7 bits of an 8-bit byte. So, all combinations are from 00000000 to 01111111 . All 128 bytes in this range are mapped to a specific character.

UTF-8s retain these accurate mappings. The character represented by 01101011 in ASCII is also represented by the same byte in UTF-8. All other characters are encoded in a sequence of several bytes in which each byte has the most significant bit; that is, each byte of all non-ASCII characters in UTF-8 is 1xxxxxxx .

Why utf8 is compatible with ascii

More articles: