Why char has 1 byte in C

Why is char 1 byte length long in C? Why is it not 2 bytes or 4 bytes long?

What is the main logic for storing it as 1 byte? I know that in Java the char length is 2 . The same question for him.

+7
c language-lawyer char
source share
6 answers

char is 1 byte in C because it is specified in standards.

The most likely logic. The (binary) representation of char (in the standard character set) can fit in 1 byte. During initial development, the most common standards were ASCII and EBCDIC , which required 7 and 8 bit coding, respectively. So, byte 1 was enough to represent the entire character set.

OTOH, during the time Java entered the picture, the concepts of extended character sets and unicode were introduced. Thus, to provide future resilience and support, char 2 bytes extends, which is capable of handling extended character set values.

+18
source share

Why does char contain more than 1 byte? A char usually represents an ASCII character. Just take a look at the ASCII table; the (extended) ASCII code contains only 256 characters. Thus, you only need to represent numbers from 0 to 255, which comes to 8 bits = 1 byte.

Look at the ASCII table, for example. here: http://www.asciitable.com/

Thats for C. When Java was designed, they expected that in the future it would be enough for any character (also Unicode) to be contained in 16 bits = 2 bytes.

+5
source share

This is because C languange is 37 years old, and there was no need to have more bytes for 1 char since only 128 ASCII characters were used ( http://en.wikipedia.org/wiki/ASCII ).

+5
source share

When C was developed (the first book was published by its developers in 1972), the two standard character encoding standards were ASCII and EBCDIC, which were 7 and 8-bit character encodings, respectively. At that time, memory and disk space were both more dangerous; C was popularized on machines with 16-bit address space, and using more bytes for strings would be considered wasteful.

By the time Java arrived (in the mid-1990s), some with a vision could understand that the language could use international stnadard to encode characters, and so Unicode was chosen to define it. By then, memory and disk space were less problematic.

+2
source share

You do not need more bytes to represent the entire ascii table (128 characters).

But there are other types of C that have more space to store data, such as an int (4 bytes) or a long double (12 bytes).

They all contain numerical values ​​(even characters, even if they are presented as β€œletters”, they are β€œnumbers”, you can compare them, add ...).

These are just different standard sizes, such as cm and m for length,

0
source share

The C language standard defines a virtual machine where all objects occupy an integer number of abstract storage units consisting of a fixed number of bits (specified by the CHAR_BIT macro in limits.h). Each repository must be uniquely addressable. A storage unit is defined as the amount of memory occupied by one character from the basic character set 1 . Thus, by definition, a char type size is 1.

Ultimately, these abstract storage units must be displayed on physical equipment. Most common architectures use individually addressed 8-bit bytes, so char objects are usually mapped to one 8-bit byte.

Usually.

Historically, native byte sizes have been between 6 and 9 bits. In C, the char type must have a width of at least 8 bits to represent all characters in the base character set, so to support a machine with 6-bit bytes, the compiler may have to map the char object to two native bytes of the machine, with CHAR_BIT being 12. sizeof (char) is still 1, so types with size N will map to 2 * N native bytes.


1. The basic character set consists of all 26 English letters in upper and lower case, 10 digits, punctuation and other graphic characters and control characters, such as newline characters, tabs, form feeds, etc., all of which are convenient fit in 8 bits.
0
source share

All Articles