Using character string arrays: pointer arrays. Are they multidimensional arrays?

I read C ++ for mannequins recently and the name was wrong or they didn’t count on me. In the section on using pointer arrays with character strings, they show a function that I was completely at a dead end on and do not know where to turn.

char* int2month(int nMonth) { //check to see if value is in rang if ((nMonth < 0) || (nMonth > 12)) return "invalid"; //nMonth is valid - return the name of the month char* pszMonths[] = {"invalid", "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"}; return pszMonths[nMonth]; } 

At first (but not the main question), I don’t understand why the return type is a pointer and how you can return pszMonths without leaving its scope. I read about this in this book and on the Internet, but I do not understand this example.

The main question I have is "How does it work?!?!". I do not understand how you can create an array of pointers and actually initialize them. If I remember correctly, you cannot do this with numeric data types. Is each pointer in a "pointer array" similar to an array containing individual characters that make up words? All this just makes me wise.

August 20 - Since it seems to me that some people are trying to help me in where my confusion actually occurs, I will try to explain it better. The code section, in particular, concerns me:

 //nMonth is valid - return the name of the month char* pszMonths[] = {"invalid", "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"}; 

I thought that when you pointed a pointer, you can only assign it to another given value. I am confused by the fact that the array of pointers (going through the book here) seems to initialize the names of the months. I did not think that pointers could actually initialize values. Is the array a dynamic memory allocation? "Unacceptable" is essentially equivalent to a "new char;" or something similar?

I will try to re-read the messages if they answer my questions, but I just did not understand for the first time.

+4
source share
6 answers

ok, let’s take one line at a time.

 char* int2month(int nMonth) 

This line is most likely WRONG , because it says that the function returns a pointer to a mutable char (by convention, this will be the first char element of the array). Instead, char const* or const char* should be used as the result type. These two specifications mean exactly the same thing, namely a pointer to a char that you cannot change.

 { 

This is just the opening bracket of the function body. The body of the function ends on the corresponding closing bracket.

 //check to see if value is in rang 

This is a comment. The compiler is ignored.

 if ((nMonth < 0) || (nMonth > 12)) return "invalid"; 

Here, the return is executed if and only if the condition in if is satisfied. The goal is to have a predictable way with the wrong argument value. However, the check is probably WRONG , because it allows you to use both the 0 and 12 values, giving a total of 13 real values, while the calendar year has only 12 months.

By the way, technically for the return operator, the specified return value is an array of 8 char elements, namely 7 characters plus zero byte at the end. This array is implicitly converted to a pointer to its first element, which is called the decay type. This particular decay, from a string literal to a pointer to a non-const char , is specifically supported in C ++ 98 and C ++ 03 to be compatible with old C, but not valid in the upcoming C ++ 0x standard.

A book should not learn such ugly things; use const for the result type.


 //nMonth is valid - return the name of the month char* pszMonths[] = {"invalid", "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"}; 

This array initialization again includes this decay. This is an array of pointers. Each pointer is initialized with a string literal, which by type is an array and splits into a pointer.

By the way, the psz prefix is ​​a monster called the Hungarian notation . It was invented for C programming by supporting the help system in the Microsoft Programmer Workbench. In modern programming, this is impractical, but instead, just a simple code reads like gibberish. You really do not want to accept it.

 return pszMonths[nMonth]; 

This indexing has a formal Undefined Behavior , also affectionately known as "UB", if nMonth is a value of 12 because element 12 does not have an array element. In practice, you will get the result of gibberish.

EDIT: oh I didn’t notice that the author put the month name "invalid" on the front, which makes for 13 elements of the array. how to hide the code ... I did not notice it, because it is very bad and unexpected; check for "invalidity" is performed above in the function.


 } 

And this is the closing bracket of the function body.

Cheers and hth.,

+4
source

Perhaps an explanation in turn will help.

 /* This function takes an int and returns the corresponding month 0 returns invalid 1 returns January 2 returns February 3 returns March ... 12 returns December */ char* int2month(int nMonth) { // if nMonth is less than 0 or more than 12, it an invalid number if ((nMonth < 0) || (nMonth > 12)) return "invalid"; // this line creates an array of char* (strings) and fills it with the names of the months // char* pszMonths[] = {"invalid", // index 0 "January", // index 1 "February", // index 2 "March", // index 3 "April", // index 4 "May", // index 5 "June", // index 6 "July", // index 7 "August", // index 8 "September",// index 9 "October", // index 10 "November", // index 11 "December" // index 12 }; // use nMonth to index the pszMonths array to return the appropriate month // if nMonth is 1, returns January because pszMonths[1] is January // if nMonth is 2, returns February because pszMonths[2] is February // etc return pszMonths[nMonth]; } 

First of all, to get away from what you may not know, is that the string literal in your program (the material with double quotes around it) is really of type char* 1 .

The second thing you might not understand is that indexing into a char* array (which is char* pszStrings[] ) gives char* , which is a string.

The reason you can return something from the local scope in this instance is because string literals are stored in the program at compile time and are not destroyed. For example, this is perfectly normal:

 char* blah() { return "blah"; } 

And it almost looks like doing this 2 :

 int blah() { return 5; } 

Secondly, after declaring an array = {/* stuff */} , which is called a list of initializers. If you do not consider the size of the array, as you do, the compiler calculates how large the array is by the number of elements in the list of initializers. So char* pszMonths[] means "array from char *", and since the list of initializers has "invalid" , "January" , "February" , etc. They are char* s 1 you just initialize your char* array multiple char* s. And you don’t understand that you cannot do this with numeric types, because you can do this with any type, numeric types and strings.

1 This is not really char* , it is a char const[x] , and you cannot change this memory as you could with char* , but that is not important to you right now.

2 Actually, this is not so, but if it helps you think about it this way, do not be shy until you improve in C ++ and can handle various subtleties without dying.

+2
source

What is your expectation of what int2month should do?

Do you have a mental model of what memory looks like? Here is my memory image, for example:

 pszMonths = [ . , . , . , ...] | | | | | | V | | "invalid" | V | "February" V "January" 

pszMonths is an array that you should already be familiar with. However, array elements are pointers. You must follow the arrows to their meanings, in which case these are strings. This indirect representation is necessary: ​​it is not easy to do this with a flat representation, because the name of each month has its own variable length.

It is very difficult to say where you are stuck without discussing more. You have to say more.

[change]

Ok, you said a little more. It sounds like you need to know a little more about the C programming model. When your program compiles, it comes down to a piece of code and a piece of data.

What is included in the data? Things like string literals. Each string literal is laid out somewhere in memory. If your compiler is good, and if you use the same literal twice, your compiler will not have two copies, but will reuse them.

Here is a small demonstration program.

 #include <stdio.h> int main(void) { char *name1 = "foo"; char *name2 = "foo"; char *name3 = "bar"; printf("The address of the string in the data segment is: %d\n", (int) name1); printf("The address of the string in the data segment is: %d\n", (int) name2); printf("The address of the string in the data segment is: %d\n", (int) name3); return 0; } 

Here is what it looks like when I run this program:

 $ ./a.out The address of the string in the data segment is: 134513904 The address of the string in the data segment is: 134513904 The address of the string in the data segment is: 134513908 

When you run a C program, part of your program data (as well as part of your program code, of course) is loaded into memory. Any pointer that refers to a location in the data is good if your program continues to run. A pointer somewhere in the data is valid for function calls, in particular.

Look at the exits more closely. name1 and name2 are pointers to the same place in the data because it is the same literal string. Your C compiler is often very good in that the data is compact and not at risk, so you can see that the bytes for the “bar” are stored directly against the bytes for “foo”.

(What we see are low-level details, and it may not always be the case that the compiler will collect string literals side by side: your compiler has the right to place a representation of these strings anywhere. But it's nice to see that he does it here.)

As a related note, why is it normal for a C program to do something like this:

 char* good_function() { char* msg = "ok"; return msg; } 

but it’s not normal to do something like this:

 char* bad_function() { char msg[] = "uh oh"; return msg; } 

These two functions have completely different meanings!

  • The first one tells the compiler: "Save this line in the data segment. When you run this function, return the address to the data segment to me."
  • The second, bad function here says: “When you run this function: create a temporary variable on the stack with enough space to write“ uh oh. ”Now pop up the temporary space and put the address back on the stack ... oh wait, this address does not indicate anywhere It's good..."
+2
source

This code does not return pszMonths , but returns one of the pointers contained in pszMonths . They point to string literals that remain valid even when leaving the field.

One part of this code, confused, is that it returns char* , not char const* . This means that it is easy to accidentally change strings. Attempting to do this will result in undefined behavior.

Typically, string literals are implemented by placing strings in the data section of an executable file. This means that pointers to them always remain valid. When the code in int2month , pszMonths is populated with pointers, but the underlying data is located elsewhere in the executable.

As I said earlier, this code is very unsafe and does not deserve to be assigned to publication in a book. String literals can be bound to char* , but in fact they consist of char const s. This makes it easy to accidentally modify them, which will actually lead to undefined behavior. The only reason this behavior exists is to maintain compatibility with C and should never be used in new code.

+1
source

In C, strings are simply sequences of bytes stored in consecutive memory locations, byte 0 denoting the end of a string. For instance,

 char *s = "abcd" 

will result in a compiler allocating 2 memory cells: one five bytes long ( abcd plus a terminating 0 ) and one large enough to hold the address of the first (s). The second location is a pointer variable, the first is what it points to.

For a string array, the compiler again reserves two memory locations. For

 char *strings[] = {"abc", "def"} 

strings will contain two pointers, and in other places there will be bytes abc\0def\0 . Then the first pointer points to a and the second to d .

+1
source

First of all, let char* be replaced with string .

So:

 string int2month(int nMonth) { /* ... */ } 

You are returning a pointer to char because you cannot return an array from char in C or C ++.


In this line:

 return "invalid"; 

"invalid" lives in the program memory. This means that it is always for you. (But this behavior is undefined if you try to change it directly without using strcpy() first! 1 )


Imagine the following:

 char* szInvalid = "invalid"; char* szJanuary = "January"; char* szFebruary = "February"; string szMarch = "March"; char* pszMonths[] = {szInvalid, szJanuary, szFebruary, szMarch}; 

You see why this is an array from char* s?


1 If you do this:

 char* szFoo = "invalid"; szFoo[0] = '!'; szFoo[1] = '?'; char* szBar = "invalid"; // This *might* happen: szBar == "!?valid" 
0
source

All Articles