GPU cache statistics L1 and L2

I wrote some simple tests that perform a series of global memory accesses. When I measure the L1 and L2 cache statistics, I found that (on a GTX580 with 16 SM):

 total L1 cache misses * 16 != total L2 cache queries

Indeed, the right side is much higher than the left (about five times). I heard that some variation in values ​​can be placed in L2. But my kernel has less than 28 registers, not so many. I wonder what will be the source of this difference? Or am I misinterpreting the meaning of these performance counters?

thank

+5
source share
2 answers

cuda Programming Guide Section G.4.2:

. -dlcm, L1, L2 (-Xptxas -dlcm = ca) ( ) L2 (-Xptxas -dlcm = cg). - 128 128- . , L1, L2, 128- , , L2, 32 . L2 , , , , .

+2

, L1 128 , L2 - 32 .

+1
source

All Articles