I looked at the programming guide and best practice recommendations, and he mentioned that accessing global memory takes 400-600 cycles. I have not seen many other types of memory, such as texture cache, persistent cache, shared memory. Registers have 0 memory latency.
I think the persistent cache is the same as registers if all threads use the same address in the persistent cache. In the worst case, I'm not sure.
Is shared memory the same as registers if there are no bank conflicts? If so then how is the delay?
How about a texture cache?
source
share