The assignment is defined as the number of active distortions according to the number of maximum distortions supported by one in-line multiprocessor. Let's say I have 4 blocks working on one SM, each block has 320 threads, i.e. 10 distortions, so 40 distortions on one SM. The lesson is 40/48, assuming that the maximum strains on one SM are 48 (CC 2.x).
But overall, I have 320 * 4 threads running on one SM, and on one SM there are only 48 CUDA cores. Why is the occupation not 100%? I use all CUDA cores ...
I'm sure something is missing from me ...
source share