"" + something in C ++

I had real bizarre stuff in my code. I believe that I tracked it to the part labeled "here" (the code, of course, is simplified):

std::string func() { char c; // Do stuff that will assign to c return "" + c; // Here } 

All kinds of things will happen when I try to cout result of this function. I think I even managed to get pieces of the underlying C ++ documentation and many segmentation faults . It is clear to me that this does not work in C ++ (I resorted to using stringstream to convert to string ), but I would like to know why. After using a lot of C # for quite some time and without C ++, it caused me a lot of pain.

+59
c ++ string
Sep 12 '14 at 16:05
source share
3 answers
  • "" is a string literal. They have an array of types N const char . This particular string literal is an array of 1 const char , one of which is a null terminator.

  • Arrays easily decay into pointers to their first element, for example. in expressions where a pointer is required.

  • lhs + rhs not defined for arrays as lhs and integers as rhs . But it is defined for pointers as lhs and integers as rhs, with normal pointer arithmetic.

  • char is an integral data type in (that is, considered as a whole) in the C ++ kernel language.

==> string literal + character is therefore interpreted as a pointer + integer.

The expression "" + c approximately equivalent to the expression:

 static char const lit[1] = {'\0'}; char const* p = &lit[0]; p + c // "" + c is roughly equivalent to this expression 



You return std::string . The expression "" + c gives a pointer to const char . The constructor is std::string , expecting that const char* expects it to be a pointer to an array of characters with a null character.

If c != 0 , then the expression "" + c leads to Undefined Behavior:

  • For c > 1 pointer arithmetic creates an Undefined Behavior. Pointer arithmetic is defined only for arrays, and if the result is an element of the same array.

  • If char signs, then c < 0 creates an Undefined Behavior for the same reason.

  • For c == 1 pointer arithmetic does not create Undefined Behavior. This is a special case; pointing to one element that has passed the last element of the array is allowed (although it is not allowed to use what it points to). This still leads to Undefined Behavior, since the constructor std::string invoked here requires that its argument be a pointer to a valid array (and a string with a terminating zero). The one by one element is not part of the array itself. Violation of this requirement also leads to UB.




What is happening now is that the std::string constructor is trying to determine the size of the string with a terminating zero that you passed by searching for the (first) character in the array, which is equal to '\0' :

 string(char const* p) { // simplified char const* end = p; while(*end != '\0') ++end; //... } 

this will result in either an access violation or the line that it creates contains β€œgarbage”. It is also possible that the compiler assumes that this Undefined Behavior will never happen, and makes some ridiculous optimizations that will lead to weird behavior.




By the way, clang ++ 3.5 gives a nice warning for this snippet:

warning: appending 'char' to a string is not added to the string [-Wstring-plus-INT]

 return "" + c; // Here ~~~^~~ 

Note: use array indexing to disable this warning.

+91
Sep 12 '14 at 16:22
source share

There are many explanations for how the compiler interprets this code, but what you probably wanted to know is what you did wrong.

It seems you are expecting + behavior from std::string . The problem is that none of the operands is actually std::string . C ++ considers operand types, not the final type of an expression (here the return type, std::string ), to allow overloading. He will not select std::string version + if he does not see std::string .

If you have special behavior for the operator (either you wrote it or got a library that provides it), this behavior is applicable only when at least one of the operands has a class type (or a reference to the class type and user settings) certain calculations transfers).

If you wrote

 std::string("") + c 

or

 std::string() + c 

or

 ""s + c // requires C++14 

then you will get the std::string behavior of the + operator.

(Note: none of them is actually a good solution, because they all make short-lived instances of std::string , which can be avoided with std::string(1, c) )

The same goes for functions. Here is an example:

 std::complex<double> ipi = std::log(-1.0); 

You will get a runtime error instead of the expected imaginary number. This is because the compiler does not know that a complex logarithm should be used here. Overloading looks only at the arguments, and the argument is a real number (type double , in fact).

The operator overloads the ARE functions and follows the same rules.

+26
Sep 12 '14 at 17:31
source share

This return statement

 return "" + c; 

. The so-called pointer arithmetic is used. The string literal "" is converted to a pointer to its first character (in this case, to its zero end), and the integer value stored in c is added to the pointer. Thus, the result of the expression

 "" + c 

has type const char *

The std :: string class has a conversion constructor that takes an argument of type const char * . The problem is that this pointer may point to a string other than a string literal. Thus, the function has undefined behavior.

I see no reason to use this expression. If you want to build a string based on one character, you can write, for example,

 return std::string( 1, c ); 

the difference between C ++ and C # is that in C # string literals there is a System.String type that overloaded operator + for strings and characters (which are Unicode characters in C #). In C ++, string literals are constant arrays of characters, and the semantics of the + operator for arrays and integers are different. Arrays are converted to pointers to their first elements and pointer arithmetic is used.

This is the standard std :: string class that overloaded the + operator for characters. String literals in C ++ are not objects of this class that are of type std :: string.

+9
Sep 12 '14 at 16:23
source share



All Articles