Why shouldn't std string class be derived from C ++?

I wanted to ask about a specific point made in Effective C ++.

It says:

A destructor must be virtual if the class should act as a polymorphic class. It is further added that since std::string does not have a virtual destructor, it should never be output. In addition, std::string is not even intended for a base class, forget the polymorphic base class.

I don’t understand what exactly is required in the class in order to be basic (not polymorphic)?

Is the only reason I should not leave the std::string class, does it have a virtual destructor? For reuse purposes, a base class can be defined, and several derived classes can be inherited from it. So what makes std::string not acceptable as a base class?

In addition, if there is a base class that is purely defined for reuse and there are many derived types, is there a way to prevent the client from executing Base* p = new Derived() because the classes are not designed to be polymorphic?

+57
c ++ string inheritance stl
May 15 '11 at 6:26 a.m.
source share
7 answers

I think this statement reflects the confusion here (my attention):

I don’t understand what exactly is required in a class to be eligible to be a base class ( not polymorphic )?

In idiomatic C ++, there are two uses for deriving from a class:

  • private inheritance used for mixes and aspect-oriented programming using templates.
  • public inheritance, used only for polymorphic situations. EDIT . Well, I think it could be used in several mixin scripts - for example, boost::iterator_facade - that appear when CRTP .

There is absolutely no reason to publicly derive a class in C ++ if you are not trying to do something polymorphic. The language comes with free features as a standard feature of the language, and free features are what you should use here.

Think of it this way: do you really want to force the clients of your code to convert to some proprietary class of strings simply because you want to use several methods? Because unlike Java or C # (or most similar object-oriented languages), when you output a class in C ++, most users of the base class should be aware of this change. In Java / C #, classes are usually accessed via links similar to C ++ pointers. Thus, there is a level of indirection that separates the customers of your class, allowing you to replace the derived class without knowing the other customers.

However, in C ++ classes , type values ​​are unlike most other OO languages. The easiest way to see this is the so-called shear problem . Basically consider:

 int StringToNumber(std::string copyMeByValue) { std::istringstream converter(copyMeByValue); int result; if (converter >> result) { return result; } throw std::logic_error("That is not a number."); } 

If you pass your own string to this method, the copy constructor for std::string will be called to make a copy, not the copy constructor for your derived object - regardless of which child class is from std::string . This can lead to inconsistencies between your methods and string attachment. The StringToNumber function cannot just take any of your derived objects and copy it simply because your derived object probably has a different size than std::string , but this function was compiled to reserve only space for std::string in automatic storage. In Java and C #, this is not a problem, because the only thing related to automatic storage is reference types, and links are always the same size. Not so in C ++.

In short - do not use inheritance to apply methods in C ++. This is not idiomatic and leads to language problems. Use, if possible, non-member functions that are not members, followed by composition. Do not use inheritance unless you are metaprogramming a template or want polymorphic behavior. For more information, see Effective C ++ . Paragraph 23: Prefer non-member functions to member functions.

EDIT: Here is a more complete example showing the shear problem. You can see its output on codepad.org

 #include <ostream> #include <iomanip> struct Base { int aMemberForASize; Base() { std::cout << "Constructing a base." << std::endl; } Base(const Base&) { std::cout << "Copying a base." << std::endl; } ~Base() { std::cout << "Destroying a base." << std::endl; } }; struct Derived : public Base { int aMemberThatMakesMeBiggerThanBase; Derived() { std::cout << "Constructing a derived." << std::endl; } Derived(const Derived&) : Base() { std::cout << "Copying a derived." << std::endl; } ~Derived() { std::cout << "Destroying a derived." << std::endl; } }; int SomeThirdPartyMethod(Base /* SomeBase */) { return 42; } int main() { Derived derivedObject; { //Scope to show the copy behavior of copying a derived. Derived aCopy(derivedObject); } SomeThirdPartyMethod(derivedObject); } 
+51
May 15 '11 at 7:12
source share

To offer the counter side a general recommendation (which sounds when there are no specific problems / performance problems) ...

Useful scenario

There is at least one scenario in which public output from databases without virtual destructors can be a good solution:

  • you want some of the benefits of type and code security to be provided by specialized custom types (classes)
  • the existing database is ideal for storing data and allows you to perform low-level operations that the client code would also like to use
  • you want to simplify the reuse of functions that support this base class
  • you understand that any any additional invariants that are logically necessary for your data can be introduced only if the code explicitly refers to the data as a derived type and depends on the degree to which it will be "natural" in your design, and how much you can trust client code to understand and collaborate with logically ideal invariants, you may want the functions of members of the derived class to repeat expectations (and throw or something else)
  • the derived class adds some high-level convenience functions that work on data, such as user queries, filtering / modifying data, streaming, statistical analysis, (alternative) iterators.
  • Associating client code with a database is more appropriate than connecting to a derived class (since the database is either stable or changes in it reflect improvements in functionality that are also the basis for the derived class)
    • in a different way: you want the derived class to continue to expose the same API as the base class, even if it means that the client code is forced to change rather than isolate it in some way, which allows the base and derived APIs to grow out of sync
  • You are not going to mix pointers to basic and derived objects in the parts of the code responsible for their removal.

This may seem rather restrictive, but there are many cases in real programs matching this scenario.

Background discussion: relative merits

Programming is a compromise. Before writing a more conceptually “correct” program:

  • consider whether this added complexity and code, which confuses the real program logic and, therefore, is more prone to errors, despite the fact that it is more efficient to handle one specific problem,
  • weigh practical costs against the likelihood and consequences of problems, and
  • Consider "return on investment" and what else you can do with your time.

If potential problems are related to the use of objects that you simply cannot imagine, someone is trying to understand your availability, scope and nature of use in the program, or you can generate compile-time errors for dangerous use (for example, claiming that the size of the derived class corresponds to the base, which will prevent the addition of new data elements), then everything else can be premature overdevelopment. Make it easy to win with clean, intuitive, concise design and code.

Reasons to consider derivation without a virtual destructor

Say you have a class D publicly derived from B. Without effort, operations on B are possible on D (with the exception of construction, but even if there are many constructors, you can often provide efficient forwarding, one template for each separate number of constructor arguments: for example, template <typename T1, typename T2> D(const T1& x1, const T2& t2) : B(t1, t2) { } . The best generalized solution in C ++ 0x variable templates.)

Also, if B changes, then by default, D provides these changes — while remaining in sync — but someone might need to look at the advanced functionality presented in D to see if it remains valid and the client’s use.

To paraphrase this: the explicit relationship between the base and derived classes is reduced, but the relationship between the base and the client is increased.

This is often NOT what you want, but sometimes it is perfect, and sometimes without problems (see the next paragraph). Changes in the main force lead to a change in the client code in places distributed throughout the code base, and sometimes people who change the database may not even have access to the client code to view or update accordingly. Sometimes it’s better: if you, as the provider of the derived class - the “man in the middle” - want the changes to the base class to be passed on to the customers, and you usually want the customers to be able - sometimes forcefully - to update their code when the changes are made to the base class without the need for constant participation, then a social conclusion may be ideal. This is a common occurrence when your class is not an independent independent entity, but a subtle value added to the base.

In other cases, the base class interface is so stable that communication can be considered invalid. This is especially true for classes, such as standard containers.

In general, public inference is a quick way to get or get closer to the ideal familiar base class interface for a derived class - in such a way that it is concise and apparently correct for the custodian encoder and client, with additional functionality available as (IMHO), which obviously differs from Sutter, Alexandrescu, etc., can help in usability, readability, and help productivity tools, including the IDE).

C ++ coding standards - reviewed by Sutter & Alexandrescu - cons

Clause 35 of the C ++ Coding Standards lists problems with the script from std::string . As the scripts go, it’s good that this illustrates the burden of exposing a large but useful API, but also good and bad, since the core API is remarkably stable — it is part of the standard library. A stable base is a common situation, but no more common than a volatile one, and good analysis should apply to both cases. Considering the list of books on these issues, I would specifically contrast the applicability of these problems to cases:

a) class Issue_Id : public std::string { ...handy stuff... }; <- social origin, our conflicting use
b) class Issue_Id : public string_with_virtual_destructor { ...handy stuff... }; <- safer OO output
c) class Issue_Id { public: ...handy stuff... private: std::string id_; }; class Issue_Id { public: ...handy stuff... private: std::string id_; }; <- compositional approach
d) using std::string everywhere, with autonomous helper functions

(Hopefully, we can agree that composition is an acceptable practice because it provides encapsulation, type safety, and a potentially enriched API that is higher than std::string .)

So, let's say you write new code and start thinking about conceptual entities in the sense of OO. Perhaps in the bug tracking system (I'm thinking of JIRA), one of them says Issue_Id. The data content is textual, consisting of the identifier of the letter project, a hyphen and an increasing problem number: for example. "MYAPP-1234". Problem identifiers can be stored in std::string , and there will be very few text queries and manipulation operations needed for problem identifiers - a large subset of those already provided on std::string , and a few more for a good measure (for example, getting The project id component that provides the next possible problem identifier (MYAPP-1235)).

On Sutter and Alexandrescu a list of problems ...

Non-semantic functions work well within existing code that already manipulates string s. If you supply super_string , you change the code base to change the types and signatures of the functions to super_string .

The main mistake with this requirement (and most of the ones below) is that it contributes to the convenience of using only a few types, ignoring the benefits of type safety. He expresses preference d) above, and not understanding c) or b) as an alternative to a). The art of programming involves balancing the pros and cons of various types to ensure reasonable reuse, performance, convenience and security. The following paragraphs describe this in detail.

Using generally accepted output, existing code can implicitly access the base class string as string and continue to behave as always. There is no reason to believe that the existing code would want to use any additional functions from super_string (in our case Issue_Id) ... in fact, this is often a lower level of support code that previously existed for the application for which you are creating super_string , and therefore does not pay attention to the needs provided by advanced features. For example, let's say that there is a function that is not a member of to_upper(std::string&, std::string::size_type from, std::string::size_type to) - it can still be applied to Issue_Id .

Thus, if the non-member support function is not cleared or expanded at an intentional cost that is closely related to the new code, then it does not need to be touched. If it is being revised to support problem identifiers (for example, using only leading alpha characters in the uppercase content format representation), then it is probably good that it is actually passed to Issue_Id , creating an ala to_upper(Issue_Id&) overload and either adhere to to_upper(Issue_Id&) approaches , or compositional approaches that provide type safety. Whether super_string or composition is used does not make any difference to effort or maintainability. A to_upper_leading_alpha_only(std::string&) reusable standalone support function is unlikely to be very useful - I can’t remember the last time I needed such a function.

The impulse to use std::string is not qualitatively different everywhere from accepting all your arguments as containers of options or void* , so you do not need to change your interfaces to accept arbitrary data, but this makes implementation errors less compelling and compiler-controlled code .

Interface functions that take a string should now: a) stay away from super_string added functionality (not useful); b) copy your argument to the super string (wasteful); or c) cast a string link to a super_string link (inconvenient and potentially illegal).

It seems that he is revising the first point code that needs to be reorganized to use the new functionality, although this time the client code, not the support code. If a function wants to begin to consider its argument as an object for which new operations are important, then it should start accepting its arguments as this type, and clients should generate them and accept them using this type. Exactly the same problems exist for the composition. Otherwise, c) can be practical and safe if you follow the recommendations below, although it is ugly.

The functions of the super_string member no longer have access to string internal functions than functions other than functions, since the string probably does not have protected members (remember that it should not have been obtained initially)

True, but sometimes it’s good. Many base classes do not have protected data. A public string interface is all that is needed to manage the content, and useful functionality (for example, get_project_id() postulated above) can be elegantly expressed in terms of these operations. It is clear that many times I got from standard containers, I did not want to extend or customize their functionality along existing lines - they are already "ideal" containers - rather, I wanted to add another dimension of behavior that does not require personal access to my application. This is because they are already good containers that can be reused.

If super_string hides some of the string functions (and overriding a non-virtual function in a derived class is not overriding, it just hides), which can cause widespread confusion in the code that manipulates string , which started its life converted automatically from super_string s.

The truth for composition is also - and most likely will happen, because the code does not pass things by default and, therefore, remains in sync, as well as the truth in some situations with polymorphic hierarchies at runtime. Samed called functions that behave differently in classes that initially look interchangeable - just disgusting. OO, ..

, super_string string , [ ]

- , - , - , , , - super_string , , "" ...

, , -, , , .

, , ....

  • , : , , ...
  • - , POD: undefined , -POD-, , , , ..
  • Liskov Substitution/
    • , std::string , : , std::string& ...* , std::string )
    • , , , ; - -
  • : , , , , ..
    • , , - , , .
  • , , ,
    • , , / "private" /

Summary

, , . , - , .

std::map<> , std::vector<> , std::string .. - , . . , , , , , . , , , (- Java, , ..) .

+20
16 '11 7:53
source share

, std::string , . .

?

, , , , , , , .

+10
15 '11 6:40
source share

( , ), , Derived , operator new private:

 class StringDerived : public std::string { //... private: static void* operator new(size_t size); static void operator delete(void *ptr); }; 

StringDerived .

+8
15 '11 6:43
source share

++ std?

. DerivedString ; std::string . , (.. string DerivedString ).

Base* p = new Derived()

. , inline Base Derived . eg.

 class Derived : protected Base { // 'protected' to avoid Base* p = new Derived const char* c_str () const { return Base::c_str(); } //... }; 
+4
15 '11 7:30
source share

, :

  • : ( ++ , )
  • : ,

std::string , (, ), Boost String Algorithm .

, , () .

EDIT

@ , , , , , . , , , , , . , , , , .

, . , (public), ( ).

+2
15 '11 7:48
source share

++ , Base , , undefined Behavior.

++ 5.3.5/3:

, , undefined.

Non-
- -. , .

String?
, - - , , . string, string , . , , -.

​​, , , std::string.

0
15 '11 6:37
source share



All Articles