I understand that Fermi GPUs support prefetching to L1 or L2 cache. However, I cannot find anything in the CUDA Reference Guide.
Does CUDA Dues allow my kernel code to pre-select specific data at a specific cache level?
caching prefetch gpgpu cuda ptx
dalibocai
source share