What is the reason for the following implementation of C char array storage?

What is the reason for implementing the following char array implementation?

char *ch1 = "Hello"; // Read-only data /* if we try ch1[1] = ch1[2]; we will get **Seg fault** since the value is stored in the constant code segment */ char ch2[] = "World"; // Read-write data /* if we try ch2[1] = ch2[2]; will work. */ 

According to the book Head first C (p. 73.74), the ch2[] array is stored both in the constant code segment and in the function stack. What is the reason for duplicating both the code and the memory stack? Why can a value only be stored on the stack if it is not readable?

+8
c language-design
source share
3 answers

We will get a Seg fault , because the value is stored in the code segment constant

This is not true: your program crashes because it receives a signal indicating a segment violation ( SIGSEGV ), which by default causes the program to terminate. But this is not the main reason. Changing a string literal is undefined behavior, regardless of whether it is stored in read-only segments or not, which is much wider than you think.

Array

stored both in the constant code segment and in the stack function.

This is an implementation detail and should not bother you: as far as ISO C is concerned, these statements make no sense. It also means that it can be implemented in different ways.

When you

  char ch2[] = "World"; 

"World" , which is a string literal, is copied to ch2 , what would you do if you used malloc and pointers. Now why is it copied?

One reason for this may be what you expect. If you could change such a string literal, what if another part of the code referenced it and expected that value? Sharing string literals is effective because you can share them in your program and save space.

By copying it, you have your own copy of the string (you "own" it), and you can change it, just like you.

Quote "Justification of the American National Standard for the C Programming Language"

String literals are considered unmodifiable. This specification allows implementations to exchange copies of strings with the same text, put string literals in read-only memory, and perform certain optimizations. However, string literals do not have an array of const char types to avoid problems with checking the type of the pointer, especially with library functions, since assigning a pointer to const char to an equal pointer to char is invalid.

+3
source share

First, let me know something. String literals are not necessarily read-only data, but simply so that undefined behavior tries to change them.

It does not have to be broken, it can work fine. But, as undefined behavior, you should not rely on it if you want the code to be executed in another implementation, another version of the same implementation, or even in the next environment.

This can happen from the moment the standards were set (the initial mandate of ANSI / ISO was to codify existing practices, not create a new language). In many implementations, strings will use space to increase efficiency, for example code:

 char *good = "successful"; char *bad = "unsuccessful"; 

as a result of:

 good---------+ bad--+ | | | VV | u | n | s | u | c | c | e | s | s | f | u | l | \0 | 

Therefore, if you changed one of the characters to good , it would also change bad .

The reason you can do this with something like:

 char indifferent[] = "meh"; 

lies in the fact that although good and bad point to a string literal, this operator actually creates an array of characters large enough to hold "meh" , and then copies the data into it 1 . A copy of the data can be freely changed.

In fact, the justification document C99 explicitly cites this as one of the reasons:

String literals are not subject to modification. This specification allows implementations to exchange copies of strings with identical text, put string literals in read-only memory, and perform certain optimizations.

But no matter why, the standard is perfectly clear on that. From C11 6.4.5 String literals :

7 / It is not known whether these arrays are different if their elements have corresponding values. If the program tries to change such an array, the behavior is undefined.

In the latter case, this is discussed in 6.7.6 Declarators and 6.7.9 Initialisation .


1 Although it’s worth noting that the normal β€œas if” rules apply here (as long as the implementation acts as if it conforms to the standard, it can do what it likes).

In other words, if the implementation may find that you are never trying to modify the data, it may well bypass the copy and use the original.

+7
source share

This is just a partial answer with a counter example to the statement that a string literal is stored in read-only memory:

 int main() { char a[]="World"; printf("%s", a); } 

gcc -O6 -S cc

 .LC0: .string "%s" ;; String literal stored as expected ;; in read-only area within code ... movl $1819438935, (%rsp) ;; First four bytes in "worl" movw $100, 4(%rsp) ;; next to bytes in "d\0" call printf ... 

Only semantics of the concept literal are implemented here; the literal "world \ 0" does not even exist.

In practice, only when the string literals are long enough, the optimizing compiler will select memcpy data from the literal pool for the stack, requiring the existence of the literal as the null terminating string.

Semantics char *ch1 = "Hello"; OTOH requires a linear array somewhere, the address of which can be assigned to the pointer ch1 .

+2
source share

All Articles