The NUMA is a shared memory system, so memory access from any processor can reach memory without blocking. If the memory model was message-based, then accessing the remote memory would require the executing processor to request that the local processor perform the required operation. However, in a NUMA system, the remote processor may still affect the closing processor performance due to the use of memory references, although this may depend on the particular architectural configuration.
As for 1, it completely depends on the OS library and malloc. The OS is responsible for representing the memory for each processor / processor as a single space or NUMA. Malloc may or may not be NUMA-aware. But fundamentally, an implementation of malloc may or may not be performed simultaneously with other requests. And the answer from Al (and the related discussion) addresses this issue in more detail.
As with 2, since memcpy consists of a series of loads and storages, the only effect will again be on the potential architectural effects of using memory controllers of other processors, etc.
Brian
source share