I develop a set of mathematical functions and implement them both in processors and in graphic processors (with CUDA).
Some of these features are based on lookup tables. Most tables take up 4 KB, some of them are slightly larger. Functions based on lookup tables enter an input, select one or two inputs of a lookup table, and then calculate the result by interpolation or using similar methods.
Now my question is: where should I save these lookup tables? The CUDA device has many places to store values ββ(global memory, read only memory, texture memory, ...). Provided that each table can be read simultaneously by many threads and that the input values ββand, therefore, the search indexes may not be completely connected between the threads of each warp (which leads to uncorrelated images of memory access), what memory provides quick access?
I add that the contents of these tables are pre-computed and completely constant.
EDIT
Just to clarify: I need to store about 10 different 4K search tables. In any case, it would be great to know what the solution is, since for this case it would be the same for the case, for example. 100 tables of 4 KB or, for example, 10 search tables of 16 KB.
Spiros
source share