String :: c_str () is no longer zeros terminated in C ++ 11?

Question

String :: c_str () is no longer zeros terminated in C ++ 11?

In C ++ 11, basic_string::c_str is defined as exactly the same as basic_string::data , which, in turn, is defined as the same as *(begin() + n) and *(&*begin() + n) (when 0 <= n < size() ).

I cannot find anything that requires a string to always have a null character at the end.

Does this mean that c_str() no longer guarantees getting a zero-terminated string?

+72

c ++ string c ++ 11

Mankarse Sep 26 2018-11-11T00:

source share

4 answers

Well, it’s actually true that the new standard provides that .data () and .c_str () are now synonymous. However, it does not say that .c_str () no longer ends in zero :)

It just means that you can now rely on .data (), but with zero completion.

N2668 paper defines the c_str () and data () elements of std :: basic_string as follows:
  const charT* c_str() const; const charT* data() const; 
Returns: a pointer to the initial element of an array of length size () + 1, whose elements of the first size () are equal to the corresponding elements of the string, controlled * by this and the last element of which is the null character specified by charT ().
Required: the program should not change any values stored in an array of characters.

Note that this does NOT mean that any valid std :: string can be thought of as a C-string, because std :: string can contain embedded zeros, which prematurely ends the C-string when used directly as const char *.

Addendum:

I don't have access to the actual published final C ++ 11 specification , but it looks like the wording was deleted somewhere in the revision history of the specification: for example, http://www.open-std.org/jtc1/sc22/wg21/ docs / papers / 2011 / n3242.pdf

§ 21.4.7 string operations basic_string [string.ops]
§ 21.4.7.1 access_string accessors [string.accessors]

  const charT* c_str() const noexcept; const charT* data() const noexcept; 
Returns: a pointer p such that p + i == &operator[](i) for each i in [0,size()] .
Difficulty: constant time.
Required: the program should not change any values stored in the character array.

+23

sehe Sep 26 '11 at 11:05

source share

The “story” was that, a long time ago, when everyone was working in single threads, or at least the threads were working with their own data, they developed a string class for C ++ that made it easier to process strings than before, and they overloaded the + operator to concatenate strings.

The problem was that users would do something like:

 s = s1 + s2 + s3 + s4;

and each concatenation would create a temporary value that the string should execute.

Therefore, someone had a brain wave of "lazy evaluation", so that inside you could store some kind of "rope" with all the lines, until someone wanted to read it as a C-line, at that moment you would change the internal representation to a continuous buffer.

This solved the problem above, but caused a load of other headaches, in particular in a multi-threaded world where it was expected that the .c_str () operation would be read-only / does not change anything and, therefore, there is no need to block anything. Premature internal locking in the class implementation just in case when someone was doing this multithreaded (when there was not even a thread standard) was also not very good. It was actually more expensive than just copying the buffer every time. For the same reason, the implementation of "copy on write" was left to implement strings.

Thus, creating .c_str() truly immutable operation turned out to be the most reasonable task, however, can you rely on it in the standard that is now known by the stream? Therefore, the new standard decided to clearly indicate what you can, and thus, the internal representation needs to be kept null terminator.

+9

CashCow Oct 24 '12 at 11:15

source share

Well noticed. This, of course, is a flaw in the recently adopted standard; I am sure there was no intention of breaking all the code currently using c_str . I would suggest a defect report, or at least ask a question in comp.std.c++ (which usually ends before the committee if it concerns a defect).

+2

James Kanze Sep 26 '11 at 11:05

source share

Mikhail Glushenkov · Accepted Answer · 2011-09-26 11:09

Now strings should use null-terminated buffers inside. Look at the definition of operator[] (21.4.5):

Requires: pos <= size() .
Returns: *(begin() + pos) if pos < size() , otherwise a reference to an object of type T with value charT() ; reference value should not be changed.

Looking back at c_str (21.4.7.1/1), we see that it is defined in terms of operator[] :

Returns: a pointer p such that p + i == &operator[](i) for each i in [0,size()] .

And both c_str and data must be O (1), so the implementation is effectively forced to use buffers with zero completion.

In addition, as David Rodríguez - dribeas points out , the return requirement also means that you can use &operator[](0) as a synonym for c_str() , so the terminating null character must be in the same buffer (because *(p + size()) must be equal to charT() ); this also means that even if the terminator is initialized lazily, it is not possible to observe the buffer in an intermediate state.

String :: c_str () is no longer zeros terminated in C ++ 11?

Addendum:

More articles: