A faster way to take xor between one row and a matrix of 1 and 0?

I have a row of data, say A = [0 1 1 1 0 0].

Matrix B contains many rows. For a fictitious example, we can say that it is simple B = [1 1 1 0 1 0; 1 0 0 1 0 1].

I want to find the number of columns in which A and row B are different, and use this difference vector to find which row B is most similar to A. So, for the example above, A is different from B(1,:)columns 1, 4, 5 = 3 of the total difference . A is different from B (2, :) in columns 1, 2, 3, 6 = 4 of the full differences, and so I would like to return index 1 to indicate that A is most similar to B (1, :).

In fact, B has ~ 50,000 rows , and A and B have about 800 columns . My current code is to find the most similar lines below:

min(sum(xor(repmat(A,B_rows,1),B),2));

It works, but it is very slow. Any understanding of which function takes so long and how to improve it?

+4
source share
5 answers

There are 6 more or less similar approaches with bsxfun, it is hard to say which one is the fastest.

%example data
A = [0 1 1 1 0 0];
B = perms(A);     %720x6 matrix

Similarity Count:

Z = sum(  bsxfun(@eq,  A,B) , 2);
Z = sum( ~bsxfun(@ne,  A,B) , 2);
Z = sum( ~bsxfun(@xor, A,B) , 2);

Difference Counting:

Z = sum( ~bsxfun(@eq,  A,B) , 2);
Z = sum(  bsxfun(@ne,  A,B) , 2);
Z = sum(  bsxfun(@xor, A,B) , 2);

Zis a vector containing the number of equal / unequal elements Afor each row B.

Benchmark for 100 trials each (in the same order as above):

t100 =

    0.0081
    0.0098
    0.0123
    0.0102
    0.0087
    0.0111
+3
source

pdist2 'hamming'

[D I] = pdist2( A, B, 'hamming', 'Smallest', 1 );
+3

( Amro Benchmark-)

function [t] = bench()
    A = [0 1 1 1 0 0];
    B = perms(A);

    % functions to compare
    fcns = {
        @() compare1(A,B);
        @() compare2(A,B);
        @() compare3(A,B);
        @() compare4(A,B);
    };

    % timeit
    t = zeros(4,1);
    for ii = 1:100;
        t = t + cellfun(@timeit, fcns);
    end
end

function Z = compare1(A,B)  %thewaywewalk
    Z = sum(  bsxfun(@eq,  A,B) , 2);
end
function Z = compare2(A,B)  %Matt J
    Z = sum(bsxfun(@xor, A, B),2);
end
function Z = compare3(A,B)  %Luis Mendo
    A = logical(A);
    Z = sum(B(:,~A),2) + sum(~B(:,A),2);
end
function Z = compare4(A,B)  %Shai
     Z = pdist2( A, B, 'hamming', 'Smallest', 1 );
end

100 720x6 B:

0.0068s   %thewaywewalk
0.0092s   %Matt J
0.0077s   %Luis Mendo
0.0399s   %Shai

100 40320x8 B:

0.0889s   %thewaywewalk
0.1512s   %Matt J
0.4571s   %Luis Mendo
0.4578s   %Shai

bsxfun , @eq.

+2

, bsxfun - ,

min(sum(bsxfun(@xor, A, B),2))
+1

, xor ( , , ):

A = logical(A);
min(sum(B(:,~A),2) + sum(~B(:,A),2))

, ~ B:

A = logical(A);
min(sum(B(:,~A),2) - sum(B(:,A),2)) + nnz(A)
+1
source

All Articles