The answer is the second option, but from there it gets a little more complicated. There is not such a thing as βtexture memoryβ, but only global memory, which is accessed through dedicated equipment, which includes cache reading on the GPU (6-8 kbytes per MP depending on the card, see table F-2 in Appendix F of the Cuda Programming Guide) and a series of hardware accelerated filtering / interpolation actions. There are two ways to use texture equipment in CUDA:
- Associate linear memory with a texture and read it in the kernel using the 1D fetch API. In this case, the texture hardware really just acts as a pass-through cache, and (IIRC) there are no filtering actions available.
- Create a CUDA array, copy the contents of the linear memory into this array and bind it to the texture. The resulting CUDA array contains a spatially ordered version of a linear source stored in global memory in the form of an (undocumented) gap filling curve . Texture hardware provides cached access to this array, includes simultaneous memory reading with hardware accelerated filtering.
You can find a review of the GT200 architecture written by David Kanter to read to better understand how the real architecture implements the memory hierarchy of APIs.
talonmies
source share