What are some behavioral anomalies with data structures that use reference counting in STL?

Scott Meyer in Effective STL says one of the things to think about when deciding which data structure to use is whether the container uses reference counting or not. He says that there are some behavior anomalies with this approach.

What are some of them? Why do containers like string and rope have abnormal behavior?

+7
source share
4 answers

As others have said, a typical example is std::string . In addition to performance issues when locking in multi-threaded programs, there are streaming problems with links to counted lines. Imagine the following:

 string s = "hello"; string t = s; // s and t share data char &c = t[0]; // copy made here, since t is non-const 

The problem is that non-const operator[] should make a copy of the string if it was split, since the returned link can be used later to change the string (you can assign it without a char link, but operator[] does not know that it must behave differently). On the other hand, const operator[] should avoid creating a copy, as this will eliminate all the advantages of reference counting (this will mean that you always make a copy in practice).

 const char &get_first(const string &s) { return s[0]; // no copy, s is const } string s = "hello"; string t = s; // s and t share data const char &c1 = get_first(t); // no copy made here const char &c2 = t[0]; // copy made, since t is non-const // c1 just got invalidated (in fact, it pointing at s[0], not t[0]). s[0] = 'X'; printf("%c, %c\n", c1, c2); // outputs "X, h" 

As you can see, this distinction is confusing and can cause really unexpected behavior.

Here is an old article about the semantics of copying to a record and its impact on performance: http://www.gotw.ca/gotw/045.htm .

Here's a suggestion with the motivation to change std::string so that it is not considered a reference in the C ++ 11 standard: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2534. html This is what the above example is based on.

+3
source

As an example, one anomaly that can occur with referenced counted strings, in particular strings with “sub-part” processing (with start / end fragment), is “failed lock”.

Suppose you have allocated memory for all the text in a file. Then you parse the file and use some "slice ()", "left ()", "mid ()" or an equivalent method. You can complete the lock of the entire line for the file, while perhaps only a very small part of it contains the actual text data (the remaining numbers that have already been analyzed, punctuation marks, or something else). That way, you might end up using more memory than necessary, while controlling faster peaks. In this case, the second problem may arise if you use multithreading and intensively use some of the lines in different threads: unnecessary memory conflict, line reference counting can increase / decrease all the time, and atomicity can get an int path that slows down all threads.

There is nothing against link counting if you know the potential problems in your application and prevent them (in this case, just do the lines “alone” by copying them).

+1
source

Typically, link counting suffers from common multithreaded race conditions, deadlocks, or excessive synchronization to avoid this.

Then you have context problems that usually require closure, for example, objects can be captured after they obviously go out of scope, but this can be ruled out by STL, I'm not an STL expert.

A discussion of various baroque regional affairs related to smart pointers is discussed here: http://www.velocityreviews.com/forums/t689414-c-primer-4th-edition-reference-counting-smart-pointers.html

+1
source

In element 13, Myers discusses in detail issues with multithreaded and refcounted strings.

This is highly dependent on the exact implementation of std::string and the lock and usage pattern.
This can be a problem if the apparently harmless use of std::string in a multi-threaded environment causes delays due to hidden locks. The penalty for such locks and context switches in a loop can be huge. But this should never cause deadlocks.
This should not be a problem. Book> 10 years. Meanwhile, thread implementations have improved. Linux-Futexes, for example. in most cases they behave much more smoothly.

Another point: (for me, I don’t know if Meyers would also discuss this ...)
The calculated std::string means that it has copy-on-write semantics. This is generally a good thing. The actual copy is delayed until it is needed. But it also means that the price of a copy must be paid at a point that may be difficult to predict.

+1
source

All Articles