I have my program, which requires maximum use of the GPU.
So, blockDim.x * blockIdx.x + threadIdx.x; has access to all threads? or is it necessary to use .y and .z also? is required?
CUDA - , , , , , . , .
2,2.threadIdx , , , , . , , .
2,2.
threadIdx , , , , . , , .
: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-hierarchy
, , , . , .x. . 1 2 3.
.x