How do the CPUs interact on different sockets?

I am tuning the performance of my parallel Java program. I'm curious about the effects of architecture.

Given a machine with two processor sockets, each of which has a quad-core Intel Xeon processor, then:

  • How do both processors interact, how fast will they exchange data?
  • How fast will two cores interact on the same chip?
  • Are four cores on the same chip equivalent in terms of sharing or accessing memory?
+7
source share
2 answers

1) How do both CPUs interact, how fast do they exchange data?

In most cases, they interact through memory or the hierarchy level of the nearest shared memory. (System memory on both SMP and NUMA is considered as a general level, even if NUMA is accessed through the memory controller of another chip, this is simply uneven = slower access)

2) How quickly will two cores interact on the same chip?

Cells on the same chip typically share L2 or L3 caches. Cores on different chips exchange data via memory or with cache cache interaction using the cache matching protocol.

Thus, if 1 (different chips), the speed (bandwidth) of the memory passing between the CPUs will be close to reading / writing simple memory. And in case 2 (the same chip), this speed can be higher, up to the read / write cache speed.

The communication delay will be several hundred CPU ticks in case 1 and several dozen in case 2.

3) Are four cores on the same chip equivalent in terms of communication or memory access?

All four cores of the same chip usually have an equivalent distance to RAM. It depends on the architecture and implementation of the chip; for some older networks, for example. the multi-core chip was really two chips packaged in one package.

+3
source

How to plan threads in the kernels for optimal memory performance depends on the memory access pattern and is usually not worth the trouble. If your program is in Java, you probably won’t have the level of control needed to achieve optimal performance.

Modern processors have built-in memory controllers, and modern multiprocessor systems have distributed memory. It is called

Uneven memory access (NUMA)

In modern Intel multiprocessor processors, communication between sockets is carried out using QPI

QuickPath Interconnect (QPI)

QPI is Intel's architecture that defines how it works. AMD's equivalent is HyperTransport. Here you can learn more about the different architectures:

System architecture

Access to a memory that misses the level 1 data cache can be served by a level 2 data cache (on the same socket) or it can be served by what Intel calls the “last level cache” (LLC) that is in the socket with a memory controller for this memory address. A hit in an LLC on another socket can amount to several tens of processor cycles, but still much faster than access to DRAM (more than a hundred processor cycles).

+8
source

All Articles