Let's start with some definitions. The volatile keyword captures while reading and frees when writing. They are defined as follows.
- Capture: a memory barrier in which other reads and writes cannot move before the fence.
- release fence: a memory barrier in which other reads and writes cannot move after the fence.
The Thread.MemoryBarrier method creates a complete fence. This means that it produces both a fence and a fence. However, the frustration of MSDN speaks about this.
It synchronizes memory access in the following way: the processor executing the current thread cannot change the order of instructions so that the memory is accessed before the MemoryBarrier function is called after the access memory that follows the MemoryBarrier call.
Interpretation of this leads us to the idea that the liberator only generates. So what is it? Full fence or semi-factory? This is probably a topic for another question. I will work on the assumption that this is a complete fence, because many smart people made this expression. But, more convincingly, BCL itself uses Thread.MemoryBarrier , as if it were creating a complete fence. Therefore, in this case, the documentation is probably incorrect. Even more amusing, the statement actually implies that the instructions before the call can somehow be sandwiched between the call and the instructions after it. That would be absurd. I say this as a joke (but not really) that it can be of benefit to Microsoft so that the lawyer can look through all the documentation regarding the stream. I am sure that their legal skills can be well used in this area.
Now I'm going to introduce an arrow symbol to help illustrate fences in action. Arrow β will be a fence, and arrow β will be a fence. Consider the arrow pointing out to squeeze the memory in the direction of the arrow. But, and this is important, memory access can be moved by the tail. Read the definitions of the fences mentioned above and make sure that the arrows visually represent these definitions.
Further we will analyze the cycle only if this is the most important part of the code. To do this, I will go to disable the loop . Here is how it looks.
LOOP_TOP: // Iteration 1 read stop into register jump-if-true to LOOP_BOTTOM β full-fence // via Thread.MemoryBarrier β read toggle into register negate register write register to toggle goto LOOP_TOP // Iteration 2 read stop into register jump-if-true to LOOP_BOTTOM β full-fence // via Thread.MemoryBarrier β read toggle into register negate register write register to toggle goto LOOP_TOP ... // Iteration N read stop into register jump-if-true to LOOP_BOTTOM β full-fence // via Thread.MemoryBarrier β read toggle into register negate register write register to toggle goto LOOP_TOP LOOP_BOTTOM:
Note that calling Thread.MemoryBarrier restricts the movement of some memory access. For example, reading toggle cannot move before reading stop or vice versa, since memory access is not allowed to move along the arrowhead.
Now imagine what would happen if the full fence was removed. The C # compiler, JIT compiler, or hardware now has much more freedom in moving instructions. In particular, lift optimization, formally known as a cycle of invariant code movement , is now being optimized. Basically, the compiler detects that stop never changes, and so the reading breaks out of the loop. Now it is effectively cached into the register. If the memory barrier were in place, then reading would have to advance through the arrowhead, and the specification specifically prohibits this. It is much easier to visualize if you unwind the loop, as I did above. Remember that a call to Thread.MemoryBarrier will occur at each iteration of the loop, so you cannot just draw conclusions about what will happen from just one iteration.
An astute reader will notice that the compiler can freely exchange toggle and stop reads so that stop gets "updated" at the end of the loop instead of the start, but this is not related to the contextual behavior of the loop. It has the same semantics and gives the same result.
My question is why, without adding Thread.MemoryBarrier () or even Console.WriteLine () in the while loop, fixes the problem?
Because the memory barrier places constraints on optimizations that the compiler can do. It will inhibit cyclic code conversion. Console.WriteLine is supposed to create a memory barrier, which is probably true. Without a memory barrier, the C # compiler, JIT compiler, or hardware frees up stop reading up and out of the loop itself.
I assume that since on a multiprocessor machine the thread works with its own value cache and never retrieves the updated stop value, since it has a value in the cache?
In a nutshell ... yes. Although keep in mind that this has nothing to do with the number of processors. This can be demonstrated with a single processor.
Or is it that the main thread does not commit this in memory?
No. The main thread will commit the record. Calling Thread.Join ensures that because it will create a memory barrier that prevents the entry from moving below the connection level.
Also why does Console.WriteLine () fix this? Is it because it also implements MemoryBarrier?
Yes. This probably creates a memory barrier. I saved a list of memory barrier generators here .