The value of bandwidth in CUDA and why it matters

The CUDA Programming Guide states that

"Bandwidth is one of the most important factors affecting performance. Almost all code changes should be made in the context of how they affect bandwidth."

Next, the theoretical throughput is calculated, which is about hundreds of gigabytes per second. I do not understand why the number of bytes that can be read / written to global memory is a reflection of how well the kernel is optimized.

If I have a kernel that intensively calculates data stored in shared memory and / or registers, with only one reading at the beginning and writing out at the end of and into global memory, of course, the effective bandwidth will be small, while the kernel itself Can be very effective.

Can anyone else explain the bandwidth in this context?

thank

+5
source share
3 answers

, , . , , / mmany.

, ​​ , . , .

+4

, . , , , . , , , .

Advanced CUDA C , , . CUDA Best Practices Gude , CUDA ( NVIDIA).

+1

Usually the kernels are quite small and simple and perform the same operation on a large amount of data. You can have many cores that you invoke sequentially to perform a more complex operation (think of it as a processing pipeline). Obviously, the throughput of your pipeline will depend on how efficient your cores are and on the limited use of memory bandwidth.

0
source

All Articles