Combining memory into global records

Question

Combining memory into global records

Is CUDA combining data into global memory as important as coalescing in global memory? If so, how can this be explained? Are there also differences between the earliest generations of CUDA devices and the latest ones on this issue?

+4

gpu gpgpu cuda kepler

Farzad Nov 25 '13 at 7:19

source share

2 answers

, . Coalescing , , , , L1 L2 .

0

Levi Barnes 25 . '13 17:42

Robert Crovella · Accepted Answer · 2013-11-25T14:54:44+0000

Merged writes (or lack thereof) can affect performance in the same way as merged reads (or lack thereof).

Consolidated reading occurs when a read request is started by a warp command, for example:

int i = my_int_data[threadIdx.x+blockDim.x*blockIdx.x];

(, , , ).

, , :

my_int_data[threadIdx.x+blockDim.x*blockIdx.x] = i;

.

, (, ) , . " ":

int i = my_int_data[0];

. "" . . , , , . "" , :

my_int_data[(threadIdx.x+5)%32] = i;

, ( ) , 1.0 1.1, .

cc 1.0 1.1 , , .

Combining memory into global records

More articles: