The execution of the inner loop, i.e.
function aggArray = aggregate(array, groupIndex, collapseFn) groups = unique(groupIndex, 'rows'); aggArray = nan(size(groups, 1), size(array, 2)); for iGr = 1:size(groups,1) grIdx = all(groupIndex == repmat(groups(iGr,:), [size(groupIndex,1), 1]), 2); aggArray(iGr,:) = collapseFn(array(grIdx,:)); end
and calling the collapse function with the measurement parameter
res=aggregate(a, b, @(x)sum(x,1));
gives some acceleration (3x on my machine) already and avoids errors, for example. the sum or average result when they encounter a single row of data without a measurement parameter, and then collapse by columns rather than labels.
If you have only one group label vector, i.e. the same group labels for all data columns, you can accelerate further progress:
function aggArray = aggregate(array, groupIndex, collapseFn) ng=max(groupIndex); aggArray = nan(ng, size(array, 2)); for iGr = 1:ng aggArray(iGr,:) = collapseFn(array(groupIndex==iGr,:)); end
The latter functions give the same results for your example with 6x acceleration, but cannot handle different group labels in the data column.
Assume a 2D test for a group index (there are also 10 different columns for groupIndex here:
a = rand(20006,10); B=[]; % make random length periods for each of the 10 signals for i=1:size(a,2) n0=randi(10); b=transpose([ones(1,n0) 2*ones(1,11-n0) sort(repmat((3:4001), [1 5]))]); B=[B b]; end tic; erg0=aggregate(a, B, @sum); toc % original method tic; erg1=aggregate2(a, B, @(x)sum(x,1)); toc %just remove the inner loop tic; erg2=aggregate3(a, B, @(x)sum(x,1)); toc %use function below
The elapsed time is 2.646297 seconds. The elapsed time is 1.214365 seconds. The elapsed time is 0.039678 seconds (!!!!).
function aggArray = aggregate3(array, groupIndex, collapseFn) [groups,ix1,jx] = unique(groupIndex, 'rows','first'); [groups,ix2,jx] = unique(groupIndex, 'rows','last'); ng=size(groups,1); aggArray = nan(ng, size(array, 2)); for iGr = 1:ng aggArray(iGr,:) = collapseFn(array(ix1(iGr):ix2(iGr),:)); end
I think it is as fast as without using MEX. Thanks to Matthew Gunn's suggestion! Profiling shows that the βuniqueβ here is really cheap, and pulling out only the first and last index of duplicate rows in groupIndex speeds up the work. I get 88x acceleration with this iteration of aggregation.