bsxfuna vector-based solution that ab (uses) as tracedefined by - sum of diagonal elements-
%// Get size of A
[m,n,r] = size(A)
%// Get indices of the diagonal elements for each 3D "slice" as columns of idx
idx = bsxfun(@plus,[1:m+1:m*n]',[0:r-1]*m*n) %//'
%// Thus, for your 3 x 3 x 3 case, idx would be -
%//idx =
%// 1 10 19
%// 5 14 23
%// 9 18 27
%// and these are the linear indices to the diagonal elements to each `3D` slide.
%//Index into A with idx and sum along columns to get each element of desired output
B = sum(A(idx),1)
If you want to preserve workspace attenuation with optional additional variables, avoid idxusing
B = sum(A(bsxfun(@plus,[1:m+1:m*n]',[0:r-1]*m*n)),1)
GPUs, gpuArrays gpuArray(A), A GPU, gpuArray, gather(..).
, :
[m,n,r] = size(A); %// Get size
gpu_A = gpuArray(A); %// copy data from CPU to GPU
%// Perform calculations on GPU
gpu_B = sum(gpu_A(bsxfun(@plus,[1:m+1:m*n]',[0:r-1]*m*n)),1); %//'
B = gather(gpu_B); %// get back output onto CPU
: GTX 750 Ti ( ), , , 3- .