Combining memory into global records

Is CUDA combining data into global memory as important as coalescing in global memory? If so, how can this be explained? Are there also differences between the earliest generations of CUDA devices and the latest ones on this issue?

+4
source share
2 answers

Merged writes (or lack thereof) can affect performance in the same way as merged reads (or lack thereof).

Consolidated reading occurs when a read request is started by a warp command, for example:

int i = my_int_data[threadIdx.x+blockDim.x*blockIdx.x];

(, , , ).

, , :

my_int_data[threadIdx.x+blockDim.x*blockIdx.x] = i; 

.

.

, (, ) , . " ":

int i = my_int_data[0];

. "" . . , , , . "" , :

my_int_data[(threadIdx.x+5)%32] = i;

, ( ) , 1.0 1.1, .

cc 1.0 1.1 , , .

+6

, . Coalescing , , , , L1 L2 .

0

All Articles