A few things, first will be flushed to memory - this is pretty wrong. It almost never sticks to main memory - it usually drains StoreBuffer to L1 , and it connects to the cache consistency protocol to synchronize data between all caches, but if you find it easier to understand this concept in these terms, it's fine - just know that it slightly different and faster.
Good question about why [StoreLoad] exists, maybe this will clarify things a bit. volatile really all about fences. Here is an example of what will happen:
int tmp = i;
What happens for volatile load and below, what happens for volatile storage:
// [StoreStore] // [LoadStore] i = tmp; // volatile store
But itβs not him, there is more. There should be sequential consistency , so any normal implementation ensures that volatile itself will not be reordered, thus adding two more fences:
// [StoreLoad] -- this one int tmp = i; // volatile load // [LoadStore] // [LoadLoad]
And one more here:
// [StoreStore] // [StoreLoad] i = tmp; // volatile store // [StoreLoad] -- and this one
Now, it turns out that on x86 3 out of 4 memory barriers are free - since this is a strong memory model . The only one that needs to be implemented is StoreLoad .
Usually, mfence is a good option for StoreLoad on x86 , but the same thing can be guaranteed with lock add , so you see it. this is basically the StoreLoad barrier. And yes, you are right in your last sentence, a weaker memory model will require the StoreStore barrier. On a side note, this is what is used when you safely publish a link through the final fields inside the constructor. After exiting the designer, two fences are inserted: LoadStore and StoreStore .
EDIT
Suppose you have this case:
[StoreStore] [LoadStore] int x = i;
In principle, there is no barrier that would prevent the volatile store from recharging with the volatile load (i.e., the volatile load would be performed in the first place), and this can cause problems; thus, the sequential sequence is broken.
You seem to lose the point here btw (if I'm not mistaken) through Every action after volatile load won't be reordered before volatile load is visible . Reordering is not possible with changing oneself - other operations can be redirected. Let me give you an example:
int y = 0; int tmp = i; // volatile load // [LoadStore] // [LoadLoad] int x = 3; // plain load y = 4; // plain store
The last two operations x = 3 and y = 4 absolutely free so that they can be reordered, they cannot float over mutable ones, but they can be redirected through them. The above example would be completely legal:
int y = 0; int tmp = i; // volatile load // [LoadStore] // [LoadLoad] // see how they have been inverted here... y = 4; // plain store int x = 3; // plain load