Maybe it's just me, but the example on the page man 2for membarrierseems meaningless.
In principle, it membarrier()is an asynchronous memory barrier, which, given the two coordinating parts of the code (let it be a call, a fast path and a slow path) allows you to move the entire hardware cost of the barrier to a slow path and leave the fast path only with compiler barrier 1 . There are several different ways to perform the behavior membarrier, for example, passing the IPI to each processor involved or expecting that the code running on each processor will be canceled, but the exact implementation data is not important here.
Now here is an example of the conversion given in the man page :
Original code
static volatile int a, b;
static void
fast_path(void)
{
int read_a, read_b;
read_b = b;
asm volatile ("mfence" : : : "memory");
read_a = a;
if (read_b == 1 && read_a == 0)
abort();
}
static void
slow_path(void)
{
a = 1;
asm volatile ("mfence" : : : "memory");
b = 1;
}
Converted Code
(some syscall and init scripts were omitted)
static volatile int a, b;
static void
fast_path(void)
{
int read_a, read_b;
read_b = b;
asm volatile ("" : : : "memory");
read_a = a;
if (read_b == 1 && read_a == 0)
abort();
}
static void
slow_path(void)
{
a = 1;
membarrier(MEMBARRIER_CMD_SHARED, 0);
b = 1;
}
Here it slow_pathperforms two writes ( a, then b), separated by a barrier, and fast_pathperforms two reads ( b, then a), also separated by a barrier.
However, the x86 memory model does not allow reordering the loading or saving of the store! So, as far as I can tell, membarrier()it is not required at all in these scenarios, nor is mfenceit needed in the source code. It seems that in both places there would be enough simple barriers for compilers 2 .
An example that really makes sense, IMO, should have a store, followed by a load shared by a barrier on the fast track.
- ?
1 ( ), - , , .
2 , , , , x86, membarrier() x86.