Vectorization: friend or foe? bsxfun / arrayfun to avoid loops, repmat, permutation, compression, etc.

This question is related to this question and possibly to this other .

Suppose you have two matrices A and B. A is M-by-N and B is N-by-K. I want to get a matrix M in K such that C(i, j) = 1 - prod(1 - A(i, :)' .* B(:, j)). I tried some solutions in Matlab - I compare their computing performance here.

% Size of matrices:
M = 4e3;
N = 5e2;
K = 5e1;

GG = 50;    % GG instances
rntm1 = zeros(GG, 1);    % running time of first algorithm
rntm2 = zeros(GG, 1);    % running time of second algorithm
rntm3 = zeros(GG, 1);    % running time of third algorithm
rntm4 = zeros(GG, 1);    % running time of fourth algorithm
rntm5 = zeros(GG, 1);    % running time of fifth algorithm
for gg = 1:GG

    A = rand(M, N);    % M-by-N matrix of random numbers
    A = A ./ repmat(sum(A, 2), 1, N);    % M-by-N matrix of probabilities (?)
    B = rand(N, K);    % N-by-K matrix of random numbers
    B = B ./ repmat(sum(B), N, 1);    % N-by-K matrix of probabilities (?)

    %% First solution
    % One-liner solution:
    tic
    C = squeeze(1 - prod(1 - repmat(A, [1 1 K]) .* permute(repmat(B, [1 1 M]), [3 1 2]), 2));
    rntm1(gg) = toc;


    %% Second solution
    % Full vectorization, using meshgrid, arrayfun and reshape (from Luis Mendo, second link above)
    tic
    [ii jj] = meshgrid(1:size(A, 1), 1:size(B, 2));
    D = arrayfun(@(n) 1 - prod(1 - A(ii(n), :)' .* B(:, jj(n))), 1:numel(ii));
    D = reshape(D, size(B, 2), size(A, 1)).';
    rntm2(gg) = toc;
    clear ii jj

    %% Third solution
    % Partial vectorization 1
    tic
    E = zeros(M, K);
    for hh = 1:M
      tmp = repmat(A(hh, :)', 1, K);
      E(hh, :) = 1 - prod((1 - tmp .* B), 1);
    end
    rntm3(gg) = toc;
    clear tmp hh

    %% Fourth solution
    % Partial vectorization 2
    tic
    F = zeros(M, K);
    for hh = 1:M
      for ii = 1:K
        F(hh, ii) = 1 - prod(1 - A(hh, :)' .* B(:, ii));
      end
    end
    rntm4(gg) = toc;
    clear hh ii

    %% Fifth solution
    % No vectorization at all
    tic
    G = ones(M, K);
    for hh = 1:M
      for ii = 1:K
        for jj = 1:N
          G(hh, ii) = G(hh, ii) * prod(1 - A(hh, jj) .* B(jj, ii));
        end
        G(hh, ii) = 1 - G(hh, ii);
      end
    end
    rntm5(gg) = toc;
    clear hh ii jj C D E F G

end

prctile([rntm1 rntm2 rntm3 rntm4 rntm5], [2.5 25 50 75 97.5])
%    3.6519    3.5261    0.5912    1.9508    2.7576
%    5.3449    6.8688    1.1973    3.3744    3.9940
%    8.1094    8.7016    1.4116    4.9678    7.0312
%    8.8124   10.5170    1.9874    6.1656    8.8227
%    9.5881   12.0150    2.1529    6.6445    9.5115

mean([rntm1 rntm2 rntm3 rntm4 rntm5])
%    7.2420    8.3068    1.4522    4.5865    6.4423

std([rntm1 rntm2 rntm3 rntm4 rntm5])
%    2.1070    2.5868    0.5261    1.6122    2.4900

The solutions are equivalent, but partial vector algorithms are more efficient in terms of memory and runtime. Even a triple loop seems better than arrayfun! Is there any approach that is actually better than a third, only partially vectorized solution?

EDIT: Dan . rntm6, rntm7 rntm8 - , . :

prctile(rntm6, [2.5 25 50 75 97.5])
%    0.6337    0.6377    0.6480    0.7110    1.2932
mean(rntm6)
%    0.7440
std(rntm6)
%    0.1970

prctile(rntm7, [2.5 25 50 75 97.5])
%    0.6898    0.7130    0.9050    1.1505    1.4041
mean(rntm7)
%    0.9313
std(rntm7)
%    0.2276

prctile(rntm8, [2.5 25 50 75 97.5])
%    0.5949    0.6005    0.6036    0.6370    1.3529
mean(rntm8)
%    0.6753
std(rntm8)
%    0.1890
+4
2

bsxfun:

E = zeros(M, K);
for hh = 1:M
  E(hh, :) = 1 - prod((1 - bsxfun(@times, A(hh,:)', B)), 1);
end

() :

E = squeeze(1 - prod((1-bsxfun(@times, permute(B, [3 1 2]), A)),2));

:

E = zeros(M, K);
At = A';
for hh = 1:M
  E(hh, :) = 1 - prod((1 - bsxfun(@times, At(:,hh), B)), 1);
end
+3

All Articles