Explanation of Thread.MemoryBarrier () Problem with OoOP

Ok, so after reading Albahari Threading in C #, I'm trying to get my head around Thread.MemoryBarrier () and processing out of order.

After Brian Gideon responded to Why We Need Thread.MemoerBarrier () , he mentions that the following code causes the program to loop indefinitely in Release mode and without debugging.

class Program { static bool stop = false; public static void Main(string[] args) { var t = new Thread(() => { Console.WriteLine("thread begin"); bool toggle = false; while (!stop) { // Thread.MemoryBarrier() or Console.WriteLine() fixes issue toggle = !toggle; } Console.WriteLine("thread end"); }); t.Start(); Thread.Sleep(1000); stop = true; Console.WriteLine("stop = true"); Console.WriteLine("waiting..."); t.Join(); } } 

My question is, why without adding Thread.MemoryBarrier () or even Console.WriteLine () in the while loop, the problem is fixed?

I assume that since on a multiprocessor machine the thread works with its own value cache and never retrieves the updated stop value because it has a value in the cache?

Or is it that the main thread does not commit this in memory?

Also why does Console.WriteLine () fix this? Is it because it also implements MemoryBarrier?

+6
source share
4 answers

It does not fix any problems. This is a fake, quite dangerous in production code, as it may work, or it may not work.

The main problem in this line

 static bool stop = false; 

The variable that stops the while is not mutable. This means that it may or may not be read from memory all the time. It can be cached, so that only the last read value will be displayed in the system (this may not be the actual current value).

This code

 // Thread.MemoryBarrier() or Console.WriteLine() fixes issue 

May or may not fix the problem on different platforms. A memory corruption or console write simply causes the application to read the latest values ​​on a specific system. Perhaps this is not the same in other places.


In addition, volatile and Thread.MemoryBarrier() provide only weak guarantees, which means that they do not provide a 100% guarantee that the read value will always be the latest on all systems and processors.

Eric Lippert says

The true semantics of erratic reads and writes are much more complex than I described here; in fact, they do not guarantee that each processor stops what it is doing and updates the caches to / from main memory. Rather, they provide weaker guarantees of how memory access is before and after reading, and ordering can be observed relative to each other. Some operations, such as creating a new thread, entering a lock, or using one of the Interlocked family of methods to guarantee order compliance. For more information, read sections 3.10 and 10.5.3 of the C # 4.0 specification.

+2
source

The compiler and processor can optimize your code by reordering it as they see fit, if any changes are consistent for a single thread. This is why you never run into problems in a single streaming program.

In the code, you have two threads that use the stop flag. The compiler or CPU can choose to cache the value in the CPU register, because it is in the stream that you create, because it can detect that you are not writing it to the stream. You need to somehow tell the compiler / processor that the variable is changing in another thread, and therefore it should not cache the value, but should read it from memory.

There are some easy ways to do this. One, if the surrounding all access to the stop variable in the lock statement. This will create a complete barrier and ensure that each thread sees the current value. Another is to use the Interlocked class to read / write a variable, as this also creates a complete barrier.

There are also certain methods, such as Wait and Join , which also remove memory barriers to prevent reordering. The Albahari book lists these methods.

+3
source

This example has nothing to do with out-of-order execution. It shows only the effect of a possible compiler that optimizes access to stop , which should be eliminated by simply marking the volatile variable. For more details, see the "Reordering memory" given in the law .

+1
source

Let's start with some definitions. The volatile keyword captures while reading and frees when writing. They are defined as follows.

  • Capture: a memory barrier in which other reads and writes cannot move before the fence.
  • release fence: a memory barrier in which other reads and writes cannot move after the fence.

The Thread.MemoryBarrier method creates a complete fence. This means that it produces both a fence and a fence. However, the frustration of MSDN speaks about this.

It synchronizes memory access in the following way: the processor executing the current thread cannot change the order of instructions so that the memory is accessed before the MemoryBarrier function is called after the access memory that follows the MemoryBarrier call.

Interpretation of this leads us to the idea that the liberator only generates. So what is it? Full fence or semi-factory? This is probably a topic for another question. I will work on the assumption that this is a complete fence, because many smart people made this expression. But, more convincingly, BCL itself uses Thread.MemoryBarrier , as if it were creating a complete fence. Therefore, in this case, the documentation is probably incorrect. Even more amusing, the statement actually implies that the instructions before the call can somehow be sandwiched between the call and the instructions after it. That would be absurd. I say this as a joke (but not really) that it can be of benefit to Microsoft so that the lawyer can look through all the documentation regarding the stream. I am sure that their legal skills can be well used in this area.

Now I'm going to introduce an arrow symbol to help illustrate fences in action. Arrow ↑ will be a fence, and arrow ↓ will be a fence. Consider the arrow pointing out to squeeze the memory in the direction of the arrow. But, and this is important, memory access can be moved by the tail. Read the definitions of the fences mentioned above and make sure that the arrows visually represent these definitions.

Further we will analyze the cycle only if this is the most important part of the code. To do this, I will go to disable the loop . Here is how it looks.

 LOOP_TOP: // Iteration 1 read stop into register jump-if-true to LOOP_BOTTOM ↑ full-fence // via Thread.MemoryBarrier ↓ read toggle into register negate register write register to toggle goto LOOP_TOP // Iteration 2 read stop into register jump-if-true to LOOP_BOTTOM ↑ full-fence // via Thread.MemoryBarrier ↓ read toggle into register negate register write register to toggle goto LOOP_TOP ... // Iteration N read stop into register jump-if-true to LOOP_BOTTOM ↑ full-fence // via Thread.MemoryBarrier ↓ read toggle into register negate register write register to toggle goto LOOP_TOP LOOP_BOTTOM: 

Note that calling Thread.MemoryBarrier restricts the movement of some memory access. For example, reading toggle cannot move before reading stop or vice versa, since memory access is not allowed to move along the arrowhead.

Now imagine what would happen if the full fence was removed. The C # compiler, JIT compiler, or hardware now has much more freedom in moving instructions. In particular, lift optimization, formally known as a cycle of invariant code movement , is now being optimized. Basically, the compiler detects that stop never changes, and so the reading breaks out of the loop. Now it is effectively cached into the register. If the memory barrier were in place, then reading would have to advance through the arrowhead, and the specification specifically prohibits this. It is much easier to visualize if you unwind the loop, as I did above. Remember that a call to Thread.MemoryBarrier will occur at each iteration of the loop, so you cannot just draw conclusions about what will happen from just one iteration.

An astute reader will notice that the compiler can freely exchange toggle and stop reads so that stop gets "updated" at the end of the loop instead of the start, but this is not related to the contextual behavior of the loop. It has the same semantics and gives the same result.

My question is why, without adding Thread.MemoryBarrier () or even Console.WriteLine () in the while loop, fixes the problem?

Because the memory barrier places constraints on optimizations that the compiler can do. It will inhibit cyclic code conversion. Console.WriteLine is supposed to create a memory barrier, which is probably true. Without a memory barrier, the C # compiler, JIT compiler, or hardware frees up stop reading up and out of the loop itself.

I assume that since on a multiprocessor machine the thread works with its own value cache and never retrieves the updated stop value, since it has a value in the cache?

In a nutshell ... yes. Although keep in mind that this has nothing to do with the number of processors. This can be demonstrated with a single processor.

Or is it that the main thread does not commit this in memory?

No. The main thread will commit the record. Calling Thread.Join ensures that because it will create a memory barrier that prevents the entry from moving below the connection level.

Also why does Console.WriteLine () fix this? Is it because it also implements MemoryBarrier?

Yes. This probably creates a memory barrier. I saved a list of memory barrier generators here .

+1
source

All Articles