How can I read from attached (locked) RAM, and not from the CPU cache (use a zero copy of DMA with a GPU)?

If I use DMA for RAM ↔ GPUs on CUDA C ++, how can I be sure that the memory will be read from the pinned (locked) RAM and not from the CPU cache?

After all, with DMA, the CPU knows nothing about someone changing memory and the need for CPU synchronization (Cache ↔ RAM). And, as far as I know, std :: memory_barier () from C + +11 does not help with DMA and will not read from RAM, but will only lead to compliance between L1 / L2 / L3 caches. In addition, in the general case, there is no protocol for resolving a conflict between the cache and RAM on the processor, but only synchronization protocols of different levels of CPU-cache L1 / L2 / L3 and multiprocessor processors in NUMA: MOESI / MESIF

0
synchronization caching gpgpu cuda dma
source share
1 answer

On x86, the CPU does snoop bus traffic, so this is not a problem. In Sandy Bridge class processors, the PCI Express bus controller is integrated into the CPU, so the processor can actually serve to read GPUs from its L3 cache or update its write-based cache using the GPU.

+1
source share

All Articles