It is separate. Texture load does not pass through L1. For applications without texturing (i.e. you do not use functions such as interpolation and fixing), the main advantage of texturing is that it allows you to selectively add most of the global memory, which can be potentially cached (subject to localization and reuse) without breaking what happens in L1. For small datasets, texturing will not produce better quality than L1. For large datasets, where there is some locality and reuse, but loads from the area that is covered by the texture cache may otherwise exceed L1 (which may be 16 KB per SM on Fermi, depending on the cache configuration) the texture cache may provide the advantage of the application as a whole. It often seems to users that texture usage is not as fast as if things could be cached in L1, but much faster than unopened loads or scattered loads that break L1. Much will depend on the access pattern and data size. The texture cache size is about 8 KB per SM. You can cache a much larger region, but a high level of reuse and locality will definitely improve the texture cache performance. Also note that texture memory is read-only. You might be interested in this webinar .
Robert Crovella
source share