static double s_x;
It is much harder to demonstrate the effect when you use dual. The CPU uses special instructions to load and store doubles, respectively, FLD and FSTP. This is much simpler since there is no single instruction that loads / saves a 64-bit integer in 32-bit mode. To observe it, you need the variable address to be inconsistent, so it crosses the line boundary of the processor cache.
This will never happen with the declaration you used, the JIT compiler ensures that the double is correctly aligned, stored at an address multiple of 8. You can save it in the class field, only the GS distributor aligns to 4 in 32-bit mode. But what the shit shoot.
The best way to do this is to intentionally mis-align the double with a pointer. Insecure the class of the program and make it look like this:
static double* s_x; static void Main(string[] args) { var mem = Marshal.AllocCoTaskMem(100); s_x = (double*)((long)(mem) + 28); TestTearingDouble(); } ThreadA: *s_x = ((i & 1) == 0) ? 0.0 : double.MaxValue; ThreadB: double x = *s_x;
This still does not guarantee good misalignment (hehe), since there is no way to precisely control where AllocCoTaskMem () will align the selection relative to the beginning of the processor cache line. And it depends on the cache associativity in your processor core (mine is Core i5). You will have to bother with the bias, I got the value 28 by experiment. The value should be divided by 4, but not by 8, in order to really mimic the behavior of the GC heap. Continue adding 8 to the value until you double it to move to the cache line and activate assert.
To make it less artificial, you have to write a program that stores a double field in the class and receives a garbage collector to move it from memory so that it is biased. It is difficult to come up with an example program that ensures that this happens.
Also note how your program can demonstrate a problem called fake sharing. Comment on the call to the Start () method for thread B and notice how the faster thread A works. You see the cost of a processor that supports a cache line that is consistent between processor cores. Sharing is intended here, as threads access the same variable. Real false sharing occurs when threads access different variables that are stored in the same cache line. Otherwise, why alignment matters, you can observe the gap twice when part of it is in one line of the cache and part of it is in another.
Hans Passant Jan 29 '12 at 15:08 2012-01-29 15:08
source share