Merged writes (or lack thereof) can affect performance in the same way as merged reads (or lack thereof).
Consolidated reading occurs when a read request is started by a warp command, for example:
int i = my_int_data[threadIdx.x+blockDim.x*blockIdx.x];
(, , , ).
, , :
my_int_data[threadIdx.x+blockDim.x*blockIdx.x] = i;
.
.
, (, ) , . " ":
int i = my_int_data[0];
. "" . . , , , . "" , :
my_int_data[(threadIdx.x+5)%32] = i;
, ( ) , 1.0 1.1, .
cc 1.0 1.1 , , .