If I can start with an example.
Let's say we have a 4-socket system where each socket has 4 cores and each socket has 2 GB of RAM ccNUMA (cache-coherent non-uniform memory access) memory type.
Let's say that 4 processes work on each socket, and everyone has a shared memory area allocated in RAM P2, designated as SHM. This means that any loading / saving in this region will lead to a search in the P2 directory, right? If so, then ... When does this happen, is this the equivalent of access to RAM in terms of delay? Where is this directory physically located? (See below)
With a more specific example: Let's say P2 executes LOAD on SHM, and this data is cast into P2 cache L3 with the tag '(O) wner'. Also, let's say P4 does LOAD on the same SHM. This will force P4 to search in the P2 directory, and since the data is marked as βPrivateβ P2, my question is:
Can P4 get SHM from P2 RAM or ALWAYS get data from P2 L3 cache?
If it always receives data from the L3 cache, will it not be faster to receive data directly from P2 RAM? So how should he already look in the P2 directory? And I understand that the directory literally sits on top of RAM.
Sorry if I grossly misunderstand what is going on here, but I hope someone can help clarify this.
Also, is there any data on how fast such a directory looks? From the point of view of data retrieval, is there documentation on the average delays in such searches? How many L3 read cycles, read-read, directory searches? and etc.
source share