What is the fastest way to count elements in an array?

In my models, one of the most repetitive tasks that need to be done is counting the number of each element in an array. The count is from a closed set, so I know that there are types of X elements, and all or some of them fill the array along with zeros that represent the "empty" cells. The array is not sorted in any way and can take quite a while (about 1 M elements), and this task is performed thousands of times during one simulation (which is also part of hundreds of simulations). The result should be a vector r size X , so r(k) is the number of k in the array.

Example:

For X = 9 , if I have the following input vector:

 v = [0 7 8 3 0 4 4 5 3 4 4 8 3 0 6 8 5 5 0 3] 

I would like to get this result:

 r = [0 0 4 4 3 1 1 3 0] 

Please note that I do not need to count zeros, and elements that are not displayed in the array (for example, 2 ) have 0 in the corresponding position of the resulting vector ( r(2) == 0 ).

What will be a quick way to achieve this?

+7
performance arrays count matlab binning
source share
2 answers

tl; dr: The fastest method depends on the size of the array. For an array smaller than 2 14, method 3 below ( accumarray ) is faster. For arrays larger than this method 2 below ( histcounts ) is better.

UPDATE: I tested this also with implicit translation , which was introduced in 2016b, and the results are almost equal to the bsxfun approach, with no significant differences in this method (relative to other methods).


Let's see what methods are available to accomplish this task. In the following examples, we will assume that X has n elements, from 1 to n , and our interest is M , which is an array of columns that can vary in size. Our result vector will be spp 1 so spp(k) is the number k in M Although I write about X here, there is no explicit implementation in the code below, I just define n = 500 and X implicitly 1:500 .

Naive for loop

The easiest and easiest way to cope with this task is the for loop, which iterates over elements from X and counts the number of elements in M equal to it:
 function spp = loop(M,n) spp = zeros(n,1); for k = 1:size(spp,1); spp(k) = sum(M==k); end end 

This is not very smart, especially if only a small group of elements from X fills M , so it’s better to first look at those that are already in M :

 function spp = uloop(M,n) u = unique(M); % finds which elements to count spp = zeros(n,1); for k = u(u>0).'; spp(k) = sum(M==k); end end 

Typically, MATLAB recommends using as many built-in functions as possible, since in most cases they are much faster. I thought of 5 options:

1. tabulate function

The tabulate function returns a very convenient frequency table, which at first glance seems like the perfect solution for this task:
 function tab = tabi(M) tab = tabulate(M); if tab(1)==0 tab(1,:) = []; end end 

The only thing you need to do is delete the first row of the table if it counts the element 0 (maybe there are no zeros in M ).

2. histcounts function

Another option that can easily histcounts for our needs,
 function spp = histci(M,n) spp = histcounts(M,1:n+1); end 

here, to read all the different elements between 1 and n separately, we define the edges as 1:n+1 , so each element from X has its own bit. We could also write histcounts(M(M>0),'BinMethod','integers') , but I already tested it and it takes longer (although it makes the function independent of n ).

3. accumarray function

The next option I will give here is to use the accumarray function:
 function spp = accumi(M) spp = accumarray(M(M>0),1); end 

here we give the function M(M>0) as input to skip zeros and use 1 as vals to count all unique elements.

4. bsxfun function

We can even use the binary @eq operation (ie == ) to search for all elements from each type:
 function spp = bsxi(M,n) spp = bsxfun(@eq,M,1:n); spp = sum(spp,1); end 

if we save the first input M and the second 1:n in different dimensions, so one is a column vector, the other is a row vector, then the function compares each element in M with each element in 1:n and creates length(M) -by- n logical matrix than we can summarize to get the desired result.

5. ndgrid function

Another option, similar to bsxfun , is to explicitly create two matrices of all capabilities using the ndgrid function:
 function spp = gridi(M,n) [Mx,nx] = ndgrid(M,1:n); spp = sum(Mx==nx); end 

then we compare them and summarize them in columns to get the final result.

Benchmarking

I did a little test to find the fastest method of all the above, I determined n = 500 for all routes. For some (especially naive for ), there is a big influence of n on runtime, but this is not a problem, since we want to check it for a given n .

Here are the results: Timing hist

We can notice a few things:

  1. Interestingly, there is a transition to the fastest method. For arrays less than 2 14 accumarray . For arrays larger than 2 14 histcounts .
  2. As expected, the naive for loops in both versions are the slowest, but for arrays smaller than 2 8 the "unique & for" option is slower. ndgrid becomes the slowest in arrays larger than 2 11 probably due to the need to store very large matrices in memory.
  3. There is some irregularity in how tabulate works on arrays smaller than 2 9 . This result was consistent (with some changes in the design) in all my tests.

(the bsxfun and ndgrid are truncated because it forces my computer ndgrid higher values, and the trend is already completely clear)

Also note that the y axis is in log 10 , so decreasing one (e.g. for arrays of size 2 19 between accumarray and histcounts ) means histcounts 10 times faster.

I will be glad to hear in the comments about the improvements of this test, and if you have another, conceptually different method, you can offer it as an answer.

The code

Here are all the functions related to the synchronization function:

 function out = timing_hist(N,n) M = randi([0 n],N,1); func_times = {'for','unique & for','tabulate','histcounts','accumarray','bsxfun','ndgrid'; timeit(@() loop(M,n)),... timeit(@() uloop(M,n)),... timeit(@() tabi(M)),... timeit(@() histci(M,n)),... timeit(@() accumi(M)),... timeit(@() bsxi(M,n)),... timeit(@() gridi(M,n))}; out = cell2mat(func_times(2,:)); end function spp = loop(M,n) spp = zeros(n,1); for k = 1:size(spp,1); spp(k) = sum(M==k); end end function spp = uloop(M,n) u = unique(M); spp = zeros(n,1); for k = u(u>0).'; spp(k) = sum(M==k); end end function tab = tabi(M) tab = tabulate(M); if tab(1)==0 tab(1,:) = []; end end function spp = histci(M,n) spp = histcounts(M,1:n+1); end function spp = accumi(M) spp = accumarray(M(M>0),1); end function spp = bsxi(M,n) spp = bsxfun(@eq,M,1:n); spp = sum(spp,1); end function spp = gridi(M,n) [Mx,nx] = ndgrid(M,1:n); spp = sum(Mx==nx); end 

And here is the script to run this code and create a graph:

 N = 25; % it is not recommended to run this with N>19 for the 'bsxfun' and 'ndgrid' functions. func_times = zeros(N,5); for n = 1:N func_times(n,:) = timing_hist(2^n,500); end % plotting: hold on mark = 'xo*^dsp'; for k = 1:size(func_times,2) plot(1:size(func_times,1),log10(func_times(:,k).*1000),['-' mark(k)],... 'MarkerEdgeColor','k','LineWidth',1.5); end hold off xlabel('Log_2(Array size)','FontSize',16) ylabel('Log_{10}(Execution time) (ms)','FontSize',16) legend({'for','unique & for','tabulate','histcounts','accumarray','bsxfun','ndgrid'},... 'Location','NorthWest','FontSize',14) grid on 

1 The reason for this strange name comes from my Ecology field. My models are cellular automata that typically mimic individual organisms in virtual space ( M above). People have different types (hence, spp ) and together form the so-called "ecological community". The β€œstate” of the community is determined by the number of individuals from each species, which is the spp vector in this answer. In these models, we first define the species pool ( X above) for the individuals to be drawn, and the community state takes into account all species in the species pool, and not just those that are present in M

+10
source share

We know that the input vector always contains integers, so why not use this to β€œsqueeze” a little more performance from the algorithm?

I experimented with some optimizations of the two best binning methods suggested by the OP , and this is what I came up with:

  • The number of unique values ​​( X in the question or n in the example) must be explicitly converted to an integer type (unsigned).
  • It is faster to compute the extra bit and then discard it than to "only process" valid values ​​(see accumi_new below).

Although the improvements are small, it can be significant when large data sets are involved.

This feature takes about 30 seconds to work on my machine. I am using MATLAB R2016a.


 function q38941694 datestr(now) N = 25; func_times = zeros(N,4); for n = 1:N func_times(n,:) = timing_hist(2^n,500); end % Plotting: figure('Position',[572 362 758 608]); hP = plot(1:n,log10(func_times.*1000),'-o','MarkerEdgeColor','k','LineWidth',2); xlabel('Log_2(Array size)'); ylabel('Log_{10}(Execution time) (ms)') legend({'histcounts (double)','histcounts (uint)','accumarray (old)',... 'accumarray (new)'},'FontSize',12,'Location','NorthWest') grid on; grid minor; set(hP([2,4]),'Marker','s'); set(gca,'Fontsize',16); datestr(now) end function out = timing_hist(N,n) % Convert n into an appropriate integer class: if n < intmax('uint8') classname = 'uint8'; n = uint8(n); elseif n < intmax('uint16') classname = 'uint16'; n = uint16(n); elseif n < intmax('uint32') classname = 'uint32'; n = uint32(n); else % n < intmax('uint64') classname = 'uint64'; n = uint64(n); end % Generate an input: M = randi([0 n],N,1,classname); % Time different options: warning off 'MATLAB:timeit:HighOverhead' func_times = {'histcounts (double)','histcounts (uint)','accumarray (old)',... 'accumarray (new)'; timeit(@() histci(double(M),double(n))),... timeit(@() histci(M,n)),... timeit(@() accumi(M)),... timeit(@() accumi_new(M)) }; out = cell2mat(func_times(2,:)); end function spp = histci(M,n) spp = histcounts(M,1:n+1); end function spp = accumi(M) spp = accumarray(M(M>0),1); end function spp = accumi_new(M) spp = accumarray(M+1,1); spp = spp(2:end); end 
+5
source share

All Articles