In each iteration of the loop, I compute the MATLAB matrix. All of these matrices must be combined together to create one finite matrix. I know the dimensions of this finite matrix before entering the loop, so although pre-distributing the matrix using the zeros function would be faster than initializing an empty array, simply adding subarrays at each iteration of my loop. Oddly enough, my program runs MUCH slower when I pre-distribute. Here is the code (only the first and last lines are different):
This is slow:
w_cuda = zeros(w_rows, w_cols, f_cols); for j=0:num_groups-1 % gets # of rows & cols in W. The last group is a special % case because it may have fewer than max_row_size rows if (j == num_groups-1 && mod(w_rows, max_row_size) ~= 0) num_rows_sub = w_rows - (max_row_size * j); else num_rows_sub = max_row_size; end; % calculate correct W and f matrices start_index = (max_row_size * j) + 1; end_index = start_index + num_rows_sub - 1; w_sub = W(start_index:end_index,:); f_sub = filterBank(start_index:end_index,:); % Obtain sub-matrix w_cuda_sub = nopack_cu(w_sub,f_sub); % Incorporate sub-matrix into final matrix w_cuda(start_index:end_index,:,:) = w_cuda_sub; end
This is fast:
w_cuda = []; for j=0:num_groups-1 % gets # of rows & cols in W. The last group is a special % case because it may have fewer than max_row_size rows if (j == num_groups-1 && mod(w_rows, max_row_size) ~= 0) num_rows_sub = w_rows - (max_row_size * j); else num_rows_sub = max_row_size; end; % calculate correct W and f matrices start_index = (max_row_size * j) + 1; end_index = start_index + num_rows_sub - 1; w_sub = W(start_index:end_index,:); f_sub = filterBank(start_index:end_index,:); % Obtain sub-matrix w_cuda_sub = nopack_cu(w_sub,f_sub); % Incorporate sub-matrix into final matrix w_cuda = [w_cuda; w_cuda_sub]; end
As for other potentially useful information - my 3D matrix, and the numbers inside it are complex. As always, any insight is appreciated.
performance memory-management arrays memory matlab
nedblorf
source share