Why CUDA employment is defined in terms of the number of active skews for the maximum basis

The assignment is defined as the number of active distortions according to the number of maximum distortions supported by one in-line multiprocessor. Let's say I have 4 blocks working on one SM, each block has 320 threads, i.e. 10 distortions, so 40 distortions on one SM. The lesson is 40/48, assuming that the maximum strains on one SM are 48 (CC 2.x).

But overall, I have 320 * 4 threads running on one SM, and on one SM there are only 48 CUDA cores. Why is the occupation not 100%? I use all CUDA cores ...

I'm sure something is missing from me ...

+6
source share
1 answer

Because the lesson has nothing to do with the kernels. CUDA is the SIMD pipeline architecture. Your 48 cores are delivered in accordance with the piping deformation instructions (this is actually a double edition). You need a lot of distortions to maintain the complete outline of the instruction, otherwise all the kernels will be stopped. That is why employment is a somewhat useful metric for quantifying the ability of a given kernel to provide sufficient parallel work to achieve reasonable performance.

+10
source

All Articles