L2 Cache in NVIDIA Fermi

Question

L2 Cache in NVIDIA Fermi

When I looked at the name of the performance counters in the NVIDIA Fermi architecture (the Compute_profiler.txt file in the doc cuda folder), I noticed that there are two performance counters for skipping the L2 cache: l2_subp0_read_sector_misses and l2_subp1_read_sector_misses. They said that these are two slices of L2.

Why do they have two slices of L2? Is there a connection with streaming multiprocessor architecture? What will be the effect of this separation on productivity?

thanks

+4

gpu gpgpu cuda nvidia

Zk1001 Aug 6 '11 at 9:42

source share

2 answers

The CUDA C Programming Guide describes the architecture of a multiprocessor. The document states that each Fermi multiprocessor has two warp schedulers. I assume that the L2 cache is split to provide concurrent caching.

I did not consider L2 reading gaps for the Kepler architecture, but Kepler multiprocessors have four warp processors. Thus, this assumption can be verified if there are four performance counters for compiling Kepler.

+1

Ryan stovall May 19 '12 at 6:22

source share

fabrizioM · Accepted Answer · 2011-08-09T23:17:06+0000

I do not think there is a direct connection with the streaming multiprocessor.

I just think that slice is equivalent to bank memory.

Just summarize the values of these two to get the “resulting” L2 reading omissions.

L2 Cache in NVIDIA Fermi

More articles: