C strtok () and only string literals

char * strtok (char * s1, const char * s2)

repeated calls to this function break the string s1 into "tokens" - this string is divided into substrings, each of which ends with the character '\ 0', where '\ 0' replaces any characters contained in the string s2. The first call uses the string denoted by s1; subsequent calls use NULL as the first argument. A pointer to the beginning of the current token is returned; ZERO returns if there are no more tokens.

Hello,

I am trying to use strtok just now and found out that if I go to char* in s1 , I will get a segmentation error. If I go through char[] , strtok works fine.

Why is this?

I googled around, and the reason seems to be something about how char* read-only, and char[] can be written. A deeper explanation would be greatly appreciated.

+4
source share
5 answers

What did you initialize char * for?

If something like

 char *text = "foobar"; 

then you have a pointer to some read-only characters

For

 char text[7] = "foobar"; 

then you have seven elements of an array of characters that you can do with what you like.

strtok writes to the string you pass to it - overwriting the delimiter character with null and storing a pointer to the rest of the string.

Therefore, if you pass it a read-only string, it will try to write it, and you will get segfault.

In addition, becasue strtok maintains a link to the rest of the line, it is not reeentrant - you can use it only one line at a time. This is best avoided - consider strsep (3) - see, for example, here: http://www.rt.com/man/strsep.3.html (although this is still written to the line it has the same problem only read / segfault)

+14
source

An important point that is deduced, but not specified explicitly:

Based on your question, I assume that you are fairly new to C programming, so I would like to explain a little more about your situation. Forgive me if I am wrong; C can be difficult to learn mainly due to a subtle misunderstanding in the underlying mechanisms, so I like to make everything as simple as possible.

As you know, when you write out your C program, the compiler pre-creates everything for you based on the syntax. When you declare a variable anywhere in your code, for example:

int x = 0;

The compiler reads this line of text and says to itself: OK, I need to replace all occurrences in the current area of ​​code x permanent link to the area of ​​memory that I allocated for storing an integer.

When your program is running, this line leads to a new action: I need to set the memory area that x refers to int value 0 .

Note the slight difference here: the memory location where the breakpoint x is located is constant (and cannot be changed). However, the value that x indicates can be changed. You do this in your code through assignment, for example. x = 15; . Also note that one line of code actually constitutes two separate commands to the compiler.

If you have an operator like:

char *name = "Tom";

The compiler process looks like this: OK, I need to replace all occurrences in the current area of ​​the name code with a permanent link to the memory area that I allocated to hold the value of the char pointer. And he does it.

But there is that second step, which boils down to the following: I need to create a constant array of characters that contains the values ​​"T", "o", "m" and NULL . Then I need to replace the part of the code where "Tom" indicates the memory address of this constant string.

When your program is running, the last step occurs: setting the pointer to char (not constant) to the memory address of the automatically generated line (which is constant).

So char * not read-only. Only const char * is read-only. But your problem in this case is not that char * read-only, it means that your pointer refers to read-only memory areas.

I give all this because understanding this problem is an obstacle between the fact that you look at the definition of this function from the library and understand the problem yourself or ask us a question. And I somewhat simplified some details in the hope of making the problem more understandable.

Hope this was helpful .;)

+5
source

I blame standard C.

 char *s = "abc"; 

could be determined to get the same error as

 const char *cs = "abc"; char *s = cs; 

on the grounds that string literals are not modifiable. But this is not so; it was determined to be compiled. Go figure. [Edit: Mike B figured out - β€œconst” didn't even exist in K & R C. ISO C, plus every version of C and C ++ since then, wanted to be backward compatible. Therefore, it must be valid.]

If it was determined to give an error, then you could not get to segfault, because the first strtok parameter is char *, so the compiler would prevent you from passing a pointer generated from a literal one.

It might seem interesting that in C ++ there was a plan for it to be obsolete ( http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1996/N0896.asc ). But 12 years later I can not convince gcc or g ++ to give me any warning about the purpose of the literal is not const const char *, so it is not all so loudly outdated.

[Edit: aha: -Wwrite-strings that are not included in -Wall or -Wextra]

+2
source

In short:

 char *s = "HAPPY DAY"; printf("\n %s ", s); s = "NEW YEAR"; /* Valid */ printf("\n %s ", s); s[0] = 'c'; /* Invalid */ 
0
source

If you look at your compiler documentation, chances are you can set these lines to be writable.

0
source

All Articles