What does `std :: kill_dependency` do, and why should I use it?

I read about the new C ++ 11 memory model, and I came to the std::kill_dependency (& sect; 29.3 / 14-15). I'm struggling to figure out why I would ever want to use it.

I found an example in the N2664 sentence, but that didn't help much.

It starts by displaying the code without std::kill_dependency . Here, the first line transfers the dependency to the second, which carries the dependency in the indexing operation, and then transfers the dependency to the do_something_with function.

 r1 = x.load(memory_order_consume); r2 = r1->index; do_something_with(a[r2]); 

The following example uses std::kill_dependency to break the dependency between the second row and indexing.

 r1 = x.load(memory_order_consume); r2 = r1->index; do_something_with(a[std::kill_dependency(r2)]); 

As far as I can tell, this means that indexing and calling do_something_with are not dependent, ordered up to the second line. According to N2664:

This allows the compiler to reorder the do_something_with call, for example, by performing speculative optimizations that predict the value of a[r2] .

To make a do_something_with call, the value of a[r2] required. If, presumably, the compiler “knows” that the array is filled with zeros, it can optimize this call to do_something_with(0); and reorder this call with respect to the other two instructions as you wish. It can produce any of:

 // 1 r1 = x.load(memory_order_consume); r2 = r1->index; do_something_with(0); // 2 r1 = x.load(memory_order_consume); do_something_with(0); r2 = r1->index; // 3 do_something_with(0); r1 = x.load(memory_order_consume); r2 = r1->index; 

Do I understand correctly?

If do_something_with synchronized with another thread in some other way, what does this mean with respect to streamlining the x.load call and that other thread?

Assuming my underestimation is correct, there is still one thing that bothers me: when I write code, what reasons will lead me to choose to kill addiction?

+53
c ++ multithreading c ++ 11 memory-model
Aug 22 '11 at 16:16
source share
4 answers

The purpose of memory_order_consume is to ensure that the compiler does not perform some unsuccessful optimizations that may violate blocking algorithms. For example, consider this code:

 int t; volatile int a, b; t = *x; a = t; b = t; 

The corresponding compiler can convert it to:

 a = *x; b = *x; 

Thus, a may not equal b. It can also perform the following actions:

 t2 = *x; // use t2 somewhere // later t = *x; a = t2; b = t; 

Using load(memory_order_consume) , we require that the use of the load value not move to the point of use. In other words,

 t = x.load(memory_order_consume); a = t; b = t; assert(a == b); // always true 

A standard document deals with the case when you may only be interested in ordering certain fields of the structure. Example:

 r1 = x.load(memory_order_consume); r2 = r1->index; do_something_with(a[std::kill_dependency(r2)]); 

This instructs the compiler to resolve this:

 predicted_r2 = x->index; // unordered load r1 = x; // ordered load r2 = r1->index; do_something_with(a[predicted_r2]); // may be faster than waiting for r2 value to be available 

Or even this:

 predicted_r2 = x->index; // unordered load predicted_a = a[predicted_r2]; // get the CPU loading it early on r1 = x; // ordered load r2 = r1->index; // ordered load do_something_with(predicted_a); 

If the compiler knows that do_something_with will not change the load result for r1 or r2, then it can even completely raise it up:

 do_something_with(a[x->index]); // completely unordered r1 = x; // ordered r2 = r1->index; // ordered 

This allows the compiler a little more freedom in optimization.

+39
Aug 22 '11 at 19:02
source share

In addition to another answer, I will point out that Scott Meyers, one of the final leaders in the C ++ community, has changed memory_order_consume a lot. He basically said that he believed that there was no place in the standard. He said that there are two cases where memory_order_consume has any effect:

  • Exotic architectures designed to support 1024-bit shared memory systems.
  • Dec alpha

Yes, once again, DEC Alpha finds its way in shame, using an optimization that was not seen in any other chip, until many years later on absurdly specialized machines.

A particular optimization is that these processors allow you to dereference a field until you actually get the address of that field (i.e., it can look for x-> y before it even looks for x using the predicted value of x). Then it returns and determines if x is the value it was expecting. Success it saved time. If it fails, it should return and get x-> y again.

Memory_order_consume tells the compiler / architecture that these operations should be performed in order. However, in the most useful case, you will want to do (x-> yz) where z will not change. memory_order_consume will cause the compiler to keep xy and z in order. kill_dependency (x-> y) .z tells the compiler / architecture that it can resume such vile reorders.

99.999% of developers are likely to never work on a platform where this feature is required (or has any effect at all).

+11
Sep 03 '13 at 5:51 on
source share

A common use case for kill_dependency comes from the following. Suppose you want to make atomic updates for a non-trivial general data structure. A typical way to do this is to non-atomically create some new data and atomize the swing of the pointer from the data structure to the new data. Once you do this, you are not going to change the new data until you drop the pointer from it to something else (and wait for all the readers to leave). This paradigm is widely used, for example. read-copy-update in the Linux kernel.

Now suppose the reader reads the pointer, reads the new data and returns later, and reads the pointer again, finding that the pointer has not changed. The hardware cannot say that the pointer has not been updated again, therefore, in consume semantics consume it cannot use a cached copy of the data, but must read it again from memory. (Or think about it differently, the hardware and the compiler cannot speculatively move the data reading before reading the pointer.)

This is where kill_dependency comes to the rescue. kill_dependency pointer in kill_dependency , you create a value that will no longer propagate the dependency, allowing access through the pointer to use a cached copy of the new data.

+3
Nov 07 '13 at 14:18
source share

I assume that it allows this optimization.

 r1 = x.load(memory_order_consume); do_something_with(a[r1->index]); 
0
Aug 22 '11 at 16:23
source share



All Articles