Why is there a false separation of the problem if the variable being modified by the thread is marked as mutable

I watched an article by Martin Thompson. This is an explanation of the false exchange.

http://mechanical-sympathy.blogspot.co.uk/2011/07/false-sharing.html

public final class FalseSharing implements Runnable { public final static int NUM_THREADS = 4; // change public final static long ITERATIONS = 500L * 1000L * 1000L; private final int arrayIndex; private static VolatileLong[] longs = new VolatileLong[NUM_THREADS]; static { for (int i = 0; i < longs.length; i++) { longs[i] = new VolatileLong(); } } public FalseSharing(final int arrayIndex) { this.arrayIndex = arrayIndex; } public static void main(final String[] args) throws Exception { final long start = System.nanoTime(); runTest(); System.out.println("duration = " + (System.nanoTime() -start)); } private static void runTest() throws InterruptedException { Thread[] threads = new Thread[NUM_THREADS]; for (int i = 0; i < threads.length; i++) { threads[i] = new Thread(new FalseSharing(i)); } for (Thread t : threads) { t.start(); } for (Thread t : threads) { t.join(); } } public void run() { long i = ITERATIONS + 1; while (0 != --i) { longs[arrayIndex].value = i; } } public final static class VolatileLong { public volatile long value = 0L; public long p1, p2, p3, p4, p5, p6; // comment out } } 

The example demonstrates the slowdown caused by multiple threads, each other's invalid cache line, although each of them only updates only one variable.

Figure 1 above shows the problem of false exchange. The thread running on core 1 wants to update the variable X, and the thread on core 2 wants to update the variable Y. Unfortunately, these two hot variables are in the same cache line. Each thread will chase ownership of the cache line so that they can update it. If core 1 gains ownership, then the cache subsystem will need to revoke the corresponding cache line for core 2. When Core 2 gains ownership and updates it, core 1 will be declared invalid for its copy of the cache line. It will ping pong back and forth through the L3 cache, significantly affecting performance. The problem will be further exacerbated if the competing kernels are on different sockets and, in addition, you have to cross the socket interconnect.

My question is the following. If all updated variables are unstable, why does this addition cause an increase in performance? I understand that a volatile variable always writes and reads to main memory. Therefore, I would suggest that every write and read of any variable in this example will reset the current line of the kernel cache.

So, according to my understanding. If thread one invalidates thread two cacheline, it will not become apparant for thread two until it begins to read the value from its own cache line. The value that it reads is a mutable value, so this actually leads to the fact that the cache stub in any case leads to reading from main memory.

Where did I make a mistake in my understanding?

thanks

+6
source share
1 answer

If all updated variables are volatile, why does this addition cause an increase in performance?

So there are two things here:

  • We are dealing with an array of VolatileLong objects with each thread working independently VolatileLong . (See private final int arrayIndex ).
  • Each of the VolatileLong objects has one volatile field.

volatile access means that threads must both invalidate the cache line that contains their volatile long value , and they need to block this cache line in order to update it. As stated in the article, the cache line is usually ~ 64 bytes or so.

The article says that adding an addition to the VolatileLong object, it moves the object, which each of the threads blocks in different lines of the cache. Therefore, although different threads still cross memory barriers because they assign them a volatile long value , they are on a different line in the cache and therefore will not cause excessive L2 cache bandwidth.

Thus, an increase in performance occurs because although threads still block their cache line to update the volatile field, these locks are now on different memory blocks, and therefore they do not interfere with other thread locks and cause the cache to be invalidated.

+4
source

All Articles