Faster Iterations - MATLAB

Consider 2 vectors A = [20000000 x 1] and B = [20000000 x 1 ]

I would need to find the sum of all A corresponding to each unique element from B.

Although it looks very simple, it happens forever in MATLAB.

I am currently using

 u = unique(B); length_u = length(u); C = zeros(length_u,1); for i = 1:length_u C(i,1) = sum(A(B==u(i))); end 

Is there a way to make it work faster? I tried to break the loop and run the parfor loops using the parallel computing toolkit (because I have only 2 cores). One more hour.

PS: Yes, I have to get a better computer.

+7
performance vectorization matlab
source share
4 answers

You should see this answer first .
If you need, you can use a combination of histc and accumarray

 A = randi( 500, 1, 100000 ); B = randi( 500, 1, 100000 ); ub = unique( B ); [ignore idx] = histc( B, [ub-.5 ub(end)+.5] ); C = accumarray( idx', A' )'; 

see a comparison of toys with a naive implementation of for -loop on ideone .

How it works?

We use the second histc output to map elements of B (and later A ) to cells defined by ub elements (unique elements of B ).
accumarray then used to sum all A records with reference to the mapping defined by idx .
Note. I assume that the unique elements of B at least 0.5 from each other.

+6
source share

If B contains only integers, you can do this easily on a single line, using the fact that sparse adds elements with the same index:

 C = nonzeros(sparse(B,1,A)); 
+3
source share

Further simplification of the code suggested by Shay :

 A = randi( 500, 1, 100000 ); B = randi( 500, 1, 100000 ); [~,~,idb] = unique( B ); C = accumarray( idb', A' )'; 

Here "idb" gives a vector similar to the "idx" vector in the code suggested by Shay .

+3
source share

I changed the amount. Instead of checking each element, but it is suitable for the case ( B==u(i) ) or not, I sorted the array and stopped the moment the element changed. When you start the next amount from this item. Thus, I only needed to length_u over each element in instead of length_u times. Here is the code I used:

 A= rand(100000,1); B= round(rand(100000,1)*25000); u = unique(B); length_u = length(u); C = zeros(length_u,1); E = zeros(length_u,1); tic; for k = 1:length_u C(k,1) = sum(A(B==u(k))); end t_OP=toc; tic D= sortrows([A,B],2); n=1; for l=1:numel(u) m=n; while m<numel(B) && D(m+1,2)==u(l) m=m+1; end E(l,1) = sum(D(n:m,1)); n=m+1; end t_trial=toc; display(t_OP) display(t_trial) 

I also used your code. The elapsed time for your code was: t_OP=10.9398 and for my modification: t_trial=0.0962 . Hope this helps. I made sure the code worked when building sum(EC) , which was 0 .
EDIT: Speedtest
I compared it with @Shai's solution. This led to

 t_OP = 10.8147 t_trial = 0.0984 t_Shai = 0.0154 


EDIT: Comment by @moarningsun
Instead of using while -loop. You can use the second unique output if you sort the array before creating the sum.

 tic A = randi( 25000, 1, 100000 ); B = randi( 25000, 1, 100000 ); D= sortrows([A',B'],2); [u, idx] = unique(D(:,2)); idx = [idx; numel(D(:,2))+1]; for l=1:numel(u) E(l,1) = sum(D(idx(l):idx(l+1)-1,1)); end t_trial=toc; 
+1
source share

All Articles