CUDA Permanent Memory Banks

When we test case use with xptxas, we see something like this:

ptxas info : Used 63 registers, 244 bytes cmem[0], 51220 bytes cmem[2], 24 bytes cmem[14], 20 bytes cmem[16] 

I wonder if there is currently any documentation that clearly explains cmem [x]. What is the point of dividing read-only memory into several banks, how many banks in general, and which banks other than 0, 2, 14, 16 are used for?

as a side note, @njuffa (thank you), previously explained on the nvidia forum, what a bank 0,2,14,16 is:

The used read-only memory is divided into constant program variables (bank 1), plus constants generated by the compiler (bank 14).

cmem [0]: kernel arguments

cmem [2]: custom constant objects

cmem [16]: constants generated by the compiler (some of which may correspond to literal constants in the source code)

+7
cuda gpu-constant-memory
source share
2 answers

The use of permanent banks of the CUDA GPU is not officially documented as far as I know. The number and use of permanent banks vary between generations of GPUs. These are low-level implementation details that programmers need not worry about.

The use of constant banks can be reversed, if necessary, using the machine code (SASS) generated for this platform. In fact, thatโ€™s how I came up with the information indicated in the original question (this information was received on my NVIDIA developers forum). As far as I remember, the information I gave there was based on adhoc reverse engineering, specially applied to devices of the Fermi class, but I can not check this at this time, because forums are not available at the moment.

One of the reasons for having multiple permanent banks is to reserve a user-visible permanent memory for use by CUDA programmers while maintaining additional read-only information provided by hardware or tools in additional permanent banks.

Please note that the CUDA math library is provided as source files and the functions are inserted into the user code, therefore the constant use of the CUDA library functions memory in the library is included in the statistics for the user-visible read-only memory.

+4
source share

See โ€œOther Uses for NVCC .โ€ They note that the permanent distribution of the bank depends on the profile.

In the PTX manual , they say that besides 64KB of read-only memory, they also had 10 banks for read-only memory. A driver can allocate and initialize persistent buffers in these regions and pass pointers to buffers as parameters of the kernel function.

I assume that this profile set for nvcc will take care of which constants get into memory. In any case, we do not need to worry if each constant memory cmem [n] is less than 64 KB, because each bank has a size of 64 KB and is common to all flows in the grid.

0
source share

All Articles