What is equivalent to concatenating CPython strings in C ++?

Possible duplicate:
Simple string concatenation

Yesterday, when I write this, someone asked SO

if I have the line x='wow' using the add function in python:

 x='wow' x.add(x) 'wowwow' 

how can i do this in c ++?

With add (which does not exist), adjusted to __add__ (the standard method), this is a deep and interesting question, including how subtle low level details, complexity of the high level algorithm and even threading !, and yet it is formulated in a very short and concise way.

I am posting the original question as my own, because I did not have a chance to provide the correct answer before deleting it, and my efforts to restore the original question, so that I could help improve a general understanding of these problems, failed.

I changed the original name of "select python or C ++" to & hellip;

  • What is the equivalent of CPython string concatenation in C ++?

thereby narrowing the question a bit.

+7
source share
1 answer

The general meaning of a code fragment.

This code snippet

 x = 'wow' x.__add__( x ) 

has different meanings in Python 2.x and Python 3.x.

In Python 2.x, strings by default are narrow strings , one byte per coding unit, corresponding to strings based on C ++ char .

In Python 3.x, strings are wide strings guaranteed to represent Unicode, which corresponds to the practical use of C ++ wchar_t , as well as with undefined 2 or 4 bytes per coding block.

Without regard to efficiency, the __add__ method behaves identically in both major Python std::basic_string corresponding to the C ++ + operator for std::basic_string (i.e. for std::string and std::wstring ), for example. quoting CPython 3k documentation :

object.__add__(self, other)
& Hellip; to evaluate the expression x + y , where x is an instance of the class that has the __add__() method, x.__add__(y) called.

So, as an example, CPython 2.7 code

 x = 'wow' y = x.__add__( x ) print y 

usually written as

 x = 'wow' y = x + x print y 

and corresponds to this C ++ code:

 #include <iostream> #include <string> using namespace std; int main() { auto const x = string( "wow" ); auto const y = x + x; cout << y << endl; } 

The main difference from the many incorrect answers given for the original question is that C ++ compliance is an expression, not an update.

It may be natural to think that the name of the __add__ method means changing the string object & rsquo; s value, update, but regarding the observable behavior of Python strings, immutable strings . Their values ​​never change, as far as this can be observed directly in the Python code. This is the same as in Java and C #, but very different from C ++ & rsquo; s mutable std::basic_string .

Quadratic time optimization in CPython.

Added CPython 2.4 the following optimization for narrow lines only:

String concatenations in operators of the form s = s + "abc" and s += "abc" now performed more efficiently under certain circumstances. This optimization will not be present in other Python implementations such as Jython, so you should not rely on this; using the join() string method is still recommended when you want to efficiently glue a large number of strings together. (Armin Rigo.)

This may not sound like much, but where applicable, this optimization reduces the sequence of concatenations from quadratic time O ( n 2 ) to linear time O ( n ) in length n of the final result.

First of all, optimization replaces concatenation with updates, for example. as if

 x = x + a x = x + b x = x + c 

or for that matter

 x = x + a + b + c 

has been replaced by

 x += a x += b x += c 

In the general case, there will be many references to a string object that x means, and since Python string objects must be immutable, the first update destination cannot change this string object. Therefore, as a rule, to create a completely new string object and assign it (link) x .

At this point, x contains a single reference to this object. This means that the object can be updated with an update destination, which adds b because there are no watchers. And also to add c .

This is a bit like quantum mechanics: you cannot watch this dirty thing happen, and it was never done when there is a chance that someone is watching fraud, but you can conclude that this should happen according to statistics that you are collecting performance because linear time is very different from quadratic time!

How is linear time achieved? Well, with the upgrade, the same buffer strategy is doubling as in C ++ std::basic_string , which means that the existing contents of the buffer need to be copied only with each redistribution of the buffer, and not for each add operation. This means that the worst case total copy cost is linear in the final row size, just like the sum (representing the cost of copying each time the buffer is doubled) 1 + 2 + 4 + 8 + & hellip; + N is less than 2 * N.

Linear expressions for string concatenation in C ++.

To accurately reproduce a piece of CPython code in C ++,

  • the final result and the expression-nature of the operation should be fixed,

  • and its performance must also be recorded!

Direct translation of CPython __add__ into C ++ std::basic_string + not performed to reliably capture CPython linear time. C ++ + string concatenation can be optimized by the compiler just like CPython optimization. Or no? which means that one told the newbie that the C ++ equivalent of Python's linear time operation is something quadratic time - hey, this is what you should use & hellip;

For performance characteristics, C ++ += is the main answer, but it does not catch the expression of the Python code expression.

The natural answer is the C ++ linear class line builder , which translates the concatenation expression into a series of updates += , so the Python code

 from __future__ import print_function def foo( s ): print( s ) a = 'alpha' b = 'beta' c = 'charlie' foo( a + b + c ) # Expr-like linear time string building. 

correspond approximately

 #include <string> #include <sstream> namespace my { using std::string; using std::ostringstream; template< class Type > string stringFrom( Type const& v ) { ostringstream stream; stream << v; return stream.str(); } class StringBuilder { private: string s_; template< class Type > static string fastStringFrom( Type const& v ) { return stringFrom( v ); } static string const& fastStringFrom( string const& s ) { return s; } static char const* fastStringFrom( char const* const s ) { return s; } public: template< class Type > StringBuilder& operator<<( Type const& v ) { s_ += fastStringFrom( v ); return *this; } string const& str() const { return s_; } char const* cStr() const { return s_.c_str(); } operator string const& () const { return str(); } operator char const* () const { return cStr(); } }; } // namespace my #include <iostream> using namespace std; typedef my::StringBuilder S; void foo( string const& s ) { cout << s << endl; } int main() { string const a = "alpha"; string const b = "beta"; string const c = "charlie"; foo( S() << a << b << c ); // Expr-like linear time string building. } 
+10
source

All Articles