Will C ++ 11 std :: string :: operator [] return a buffer with zero completion

I have an object of class std::string , which I need to pass to the C-function, which manages the char* buffer, iterates over it and searches for a character with zero completion.

So, I have something like this:

 // C function void foo(char* buf); // C++ code std::string str("str"); foo(&str[0]); 

Suppose we use C ++ 11, so we have a guarantee that the std::string representation will have adjacent characters stored.

But I wonder if there is a guarantee that &str[0] will point to a buffer that ends with \0 ? Yes, there is a c_str member function, but I'm talking about operator[] .

Can someone quote the standard?

+5
source share
2 answers

In practice, yes. There are exactly zero std::string implementations that conform to standards that do not store the NUL character at the end of the buffer.

So, if you are not interested in wondering, you are done.

However, if you're curious that the standard is absurd:


In C ++ 14, yes. There is a clear requirement that [] return a continuous set of elements, and [size()] must return a NUL character, and const methods cannot change state. So *((&str[0])+size()) should be the same as str[size()] , and str[size()] should be NUL, so the game is over.


In C ++ 11, almost certainly. There are rules that const methods cannot change state. There are guarantees that data() and c_str() return a zero-terminated buffer that matches [] at each point.

A folded reading of the C ++ 11 standard indicates that prior to any call to data() or c_str() , [size()] does not return the NUL terminator at the end of the buffer, but rather a static const CharT that is stored separately, and the buffer has a unified (or even trap value), where should be NUL. Due to the fact that the const methods do not change state, I believe that this reading is incorrect.

This requires a &str[str.size()] change between calls to .data() , which is an observable state change in string on a const call that I would consider illegal.

An alternative way to get around the standard may be to not initialize str[str.size()] until you legally gain access to it by calling .data() , .c_str() or actually passing str.size() to the operator[] . Since there are no specific ways to access this element other than those specified in standard 3, you can stretch things and say that lazy NUL initialization is legal.

I would question this because the definition of .data() implies that the return value [] is adjacent, therefore &[0] is the same address as .data() and .data()+.size() guaranteed to point to a NUL CharT therefore must (&[0])+.size() , and without const methods, called std::string states, cannot be changed between calls.

But what if the fact that the compiler can watch and see that you will never name .data() or .c_str() , does the adjacency requirement, if it can be proved, you never name them?

At what point will I raise my hands and shoot the hostile compiler.


The standard is very passively voiced about this. Thus, there may be a way to ensure compliance with std::string standards that do not comply with these rules. And as warranties are getting closer and closer to the explicit requirement that a NUL terminator exists, the odds are against a new compiler showing that it uses C ++ read torture to claim that it is standards compliant.

+7
source

According to the standard, yes. The basic char container is accessible using string::data or string::c_str , which states in the standard:

21.4.7.1 basic_string accessors [string.accessors]
const charT* c_str() const noexcept;
const charT* data() const noexcept;

1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()] .
2 Difficulty: Constant time.
3 Required: the program should not change any values ​​stored in the character array.

And to prove that it is null-terminated, look at the definition of operator[] (my selection):

21.4.5 basic_string access to the element [string.access]
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);

1 Requires: pos <= size().
2 Returns: *(begin() + pos) if pos < size() . Otherwise, a reference is returned to the object of type charT with the value charT() , where changing the object leads to undefined behavior.
3 Throws: Nothing.
4 Difficulty: constant time.

Thus, operator[size()] returns charT() , and since std::string is std::basic_string<char> , charT() is '\0' .

This means that in your case *(&str[0] + str.size()) == '\0' must conform to the standard always true .


Beware that changing operator[size()] is UB.

+5
source

All Articles