C char pointer problem

if we declare char * p="hello"; , since it is written in the data section, we cannot change the contents pointed to by p, but we can change the pointer itself. but I found this example in C Traps and Pitfalls Andrew Koenig AT & T Bell Laboratories Murray Hill, NJ 07974

example

 char *p, *q; p = "xyz"; q = p; q[1] = 'Y'; 

q indicates memory containing the string xYz. So p will be, because p and q point to the same memory.

how this is true, if the first statement mentioned above is also true. Similarly, I ran the following code

 main() { char *p="hai friends",*p1; p1=p; while(*p!='\0') ++*p++; printf("%s %s",p,p1); } 

and got the result as ibj!gsjfoet

Explain how in both cases we can change the content? thanks in advance

+4
source share
11 answers

In the same example, a segmentation error occurs on my system.

Here you work in undefined. .data (note that a string literal can be in .text too) is not necessarily unchanged - there is no guarantee that the machine will write this memory (via page tables), depending on the operating system and compiler.

+5
source

Only your OS can guarantee that the material in the data section is read-only, and even this involves setting segment boundaries and access flags, as well as using pointers, etc., therefore this is not always done.

C in itself does not have such a limitation; in a flat memory model (which almost all 32-bit OSs use), any bytes in your address space are potentially writable, even in the code section. If you have a pointer to main () and some knowledge of the machine language, as well as an OS that has things that were configured correctly (more precisely, it was not possible to prevent it), you could rewrite it to just return 0. attention that this is all black magic of the kind, and is rarely done intentionally, but it is part of what makes C such a powerful language for system programming.

+4
source

Even if you can do it, and it seems that there are no errors, this is a bad idea. Depending on the program in question, you may end up alleviating buffer overflow attacks. Good article explaining this:

https://www.securecoding.cert.org/confluence/display/seccode/STR30-C.+Do+not+attempt+to+modify+string+literals

+3
source

It depends on the compiler as to whether this works or not.

x86 is von Neumann's architecture (unlike Harvard ), so there is no clear difference between the "data" and "program" memory at the basic level (i.e., the compiler is not forced to have different types for the program and data memory, and therefore will not be sure to limit any variable to one or another).

Thus, one compiler can allow the modification of a string, while the other cannot.

I assume that a softer compiler (e.g. cl, MS Visual Studio C ++ compiler) will allow this, while a more strict compiler (e.g. gcc) will not. If your compiler allows this, most likely it will effectively change your code to something like:

 ... char p[] = "hai friends"; char *p1 = p; ... // (some disassembly required to really see what it done though) 

perhaps with a “good intention” to allow the use of new C / C ++ codecs with less restriction / fewer confusing errors. (regardless of whether this is a “Good Thing”, until the big debate, and I will stick to my opinion mainly from this post: P)

Of interest, which compiler did you use?

+1
source

In the old days, when C, described by K and R in their book "Programming Language C", was "standard", what you described was perfectly normal. In fact, some compilers jumped over hoops to make literals on strings writable. They had difficulty copying lines from a text segment to a data segment during initialization.

Even now, gcc has a flag to restore this behavior: -fwritable-strings .

+1
source
 main() { int i = 0; char *p= "hai friends", *p1; p1 = p; while(*(p + i) != '\0') { *(p + i); i++; } printf("%s %s", p, p1); return 0; } 

This code will give the result: hai friends hai friends

+1
source

Changing string literals is a bad idea, but that doesn't mean it might not work.

Truly a good reason is not that: your compiler is allowed to take multiple instances of the same string literal and point to the same memory block. Therefore, if "xyz" was defined elsewhere in your code, you might inadvertently break another code that expected it to be constant.

0
source

Your program also works on my system (windows + cygwin). However, the standard says that you should not do this, although the effect is not defined.

Following an excerpt from Book C: 5 / E Reference Guide, page 33,

You should not try to change the memory containing the string constant characters, as it may be read-only

 char p1[] = "Always writable"; char *p2 = "Possibly not writable"; const char p3[] = "Never writable"; 

p1 will always work; p2 line may work or may cause a runtime error ; p3 always causes a compile-time error.

0
source

Although modifying a string literal may be possible on your system, it is a fad of your platform, not a language guarantee. Actual C does not know anything about .data or .text sections. These are all implementation details.

On some embedded systems, you don’t even have a file system containing a file with a .text section. On some such systems, your string literals will be stored in ROM, and attempting to write to ROM will simply cause the device to crash.

If you write code that depends on undefined behavior and works only on your platform, you can be sure that sooner or later someone will think that it is a good idea to transfer it to some new device, work the way you expected. When this happens, an angry package of embedded developers will prey on you and strike you.

0
source

p effectively points to read-only memory. The result of assigning to the array p indicates probably undefined behavior. Just because the compiler allows you to leave with it does not mean that everything is in order.

Take a look at this question from the C-FAQ: comp.lang.c questions list · Question 1.32

Q: What is the difference between these initializations?

 char a[] = "string literal"; char *p = "string literal"; 

My program crashes if I try to assign a new value to p [i].

A: A string literal (the formal term for a double-quoted string in a C source) can be used in two different ways:

  • As an initializer for a char array, as in the chara [] declaration, it determines the initial character values ​​in this array (and, if necessary, its size).
  • Elsewhere, it becomes an unnamed, static array of characters, and this unnamed array can be stored in read-only memory and therefore not necessarily modified. In the context of the expression, the array is immediately converted to a pointer, as usual (see section 6), so the second declaration initializes p to first point to the unnamed array element.

Some compilers have a switch control string literals are writable or not (to compile old code), and some may have called string literals formally considered as const char arrays (for the best catch of errors).

0
source

I think you are making a lot of confusion in a very important general concept to understand when using C, C ++, or other low-level languages. In a low-level language, there is an implicit assumption that the programmer knows what he is doing and makes no programming error .

This assumption allows language developers to simply ignore what should happen if the programmer breaks the rules. The end effect is that in C or C ++ there is no guarantee of "runtime" ... if you do something bad, it's just UNCERTAINTY ("undefined behavior" is a legal term), which should happen. Maybe it’s a crash (if you’re very lucky), or it may just be clearly nothing (unfortunately, in most cases ... it may crash in the right place when a million instructions are executed later).

For example, if you go beyond the array MAYBE , you will crash, maybe not, it may even be that the demon comes out of your nose (this is the “nose demon” you can find on the Internet). Just not what the compiler wrote, I thought.

Just never do this (if you need to write decent programs).

An additional burden for those who use low-level languages ​​is that you should study all the rules well and not break them. If you break the rule, you cannot expect the "runtime error angel" to help you ... only the "undefined behavior daemons" are there.

0
source

Source: https://habr.com/ru/post/1313472/


All Articles