How does MATLAB handle dynamic array distribution?

Question

How does MATLAB handle dynamic array distribution?

I am not very versed in MATLAB and I wonder how it handles dynamic memory allocation under the hood?

One of the main ways is to highlight large chunks and more than necessary so that you do not have to highlight each added item. After doing a little research, I saw a lot of people who personally controlled their own distribution of large numbers (provided that they did not know their final size) or did things such as creating the maximum size and then trimming. An example is the Undocumented MATLAB , which advises you to do block memory allocation yourself. I would think that a language like MATLAB would know to do this myself, and I would not worry about such problems. This means that if you try to add one new element to the array, MATLAB allocates new memory only for that single element, which is terribly inefficient.

My question is twofold

For dynamic arrays, does MATLAB allocate large chunks and more memory than is necessary to save on computational efficiency, or allocates memory only for what is combined?
If the first case, is there any reason MATLAB decided to go with this design?

+7

memory-management arrays matlab

zephyr Sep 24 '15 at 16:28

source share

3 answers

MATLAB works with fixed-size memory blocks. When you create a new matrix, say, using the zeros function, you allocate enough for the matrix and its metadata. If you look at the notation that MATLAB uses to add to the matrix, it almost explains itself:

>> a = zeros(1, 3) a = 0 0 0 >> a = [a 1] a = 0 0 0 1

You created a new matrix [a 1] (which, by the way, is an alias for horzcat ), then saved it to a . However, you did not need to store it back in a , in which case you will have two matrices floating in memory. This happens every time you combine matrices of any size.

Another sign of what is happening is the size of the matrix specified in the variable inspector. If you notice, it never pretends to allocate anything other than enough space for data and metadata.

The goal of this project is quite simple. MATLAB works with matrices. Matrices usually don't change much, but they often get used to creating new matrices. Thus, the presence of fixed size arrays and a good distribution mechanism is much more important than the user’s desire to expand the arrays. The MATLAB profiler will tell you this if you try to include matrix concatenation in a loop. The experiment will show you that the profiler is not lying. This is simply not the purpose of the language.

All of the above also apply to numpy arrays in Python, except that it is better documented.

In a related note, keep in mind that MATLAB is well integrated with Java, so if you need a well-managed extensible array, you can use java.util.ArrayList directly in MATLAB.

+3

Mad physicist Sep 24 '15 at 16:54

source share

I put together a test function to bring out some more specific details about dynamic allocation. The function used is at the bottom of the answer.

The function calls several location functions that compute the y value from the input x value of variable length with logarithmic diversity. Functions are different, but the type of cycle ( for and while ) and the distribution scheme (dynamic and preliminary allocation). Results taken from R2014b. Below are the results that have passed using four functions.

For a low number of items, the execution time is constant; to count high elements, all four functions enter the growth mode with similar growth rates. Adjusting the data power (i.e., c = ArgMin[c(1)*n^c(2) - time] ) returns an exponent in the range from 0.93 to 0.96 (linear growth) over four data sets.

I am really surprised at these results, because either I'm missing something in the test, or the Matlab JIT compiler is really good at distributing arrays (possibly related lists under the hood). The pre-allocated for -loop worked faster, but only 12% faster than the dynamic version.

Turning to memory consumption, I launched dynamic and pre-allocated versions of the for -loop option for several linearly increasing elements in the ten million element range. The total memory used by Matlab was selected at the beginning and end of the cycles. The results are shown below.

As shown, the initial memory for function calls is identical (for the most part) between the two functions. The memory usage for the pre-allocated version increases linearly with the element count, which is expected. But the memory used for the dynamic version, although piecewise-ruled, has a tendency to jump-like rearrangement at several points. For me, this implies that Matlab does some form of table growth (not necessarily doubling the table, as I think Python), so as not to redistribute the array at each iteration (which makes me think about my thought about linked lists from the above discussion time).

I found these results interesting, but I really can’t conclude much more for the thoughts above. However, regardless of anything, explicit pre-allocation will always be better for runtime and maintenance of the code in any numerical heavy application.

Any comments on the results and the test function (perhaps a discussion of how this test just goes wrong) are, of course, welcome. This is how I (we) learn.

 function [] = test() % Constants and setup funs = {@forDynamic,@forAllocate,@whileDynamic,@whileAllocate}; Nfun = numel(funs) ; % Nalloc = 2E7 ; Nsamp = 50 ; % x = linspace(0,1,Nalloc) ; nIndex = round(logspace(1,log10(Nalloc),Nsamp)) ; times = repmat({zeros(1,Nsamp)},1,Nfun) ; % Array growth time-data for k = 1:numel(funs) f = funs{k}; for m = 1:Nsamp tic; f(x(1:nIndex(m))); times{k}(m) = toc; fprintf(['Iteration ',num2str(m,'%02G'),' of function ',num2str(k),' done.\n']); end end % Plot figure(1); args(2:2:2*Nfun) = times; args(1:2:2*Nfun) = repmat({nIndex},1,Nfun); loglog(args{:}); legend('Dynamic Allocation (for)','Pre-Allocation (for)','Dyanmic Allocation (while)','Pre-Allocation (while)','Location','Northwest'); grid('on'); axis([nIndex(1),nIndex(end),0.95*min([times{:}]),1.05*max([times{:}])]); xlabel('Number of Array Elements [-]'); ylabel('Elasped Time [s]'); % Switch to linear scale near allocation max and only look at for-functions Nsamp = 50 ; nIndex = round(10.^linspace(log10(Nalloc)-1.5,log10(Nalloc),Nsamp)) ; mstart = repmat({zeros(1,Nsamp)},1,Nfun/2) ; mend = repmat({zeros(1,Nsamp)},1,Nfun/2) ; % Array growth memory-data for k = 1:numel(funs)/2 f = funs{k}; for m = 1:Nsamp [~,mstart{k}(m),mend{k}(m)] = f(x(1:nIndex(m))); fprintf(['Iteration ',num2str(m,'%02G'),' of function ',num2str(k),' done.\n']); end end % Plot figure(2); mem = [mstart,mend]; args(2:2:2*Nfun) = mem ; args(1:2:2*Nfun) = repmat({nIndex},1,Nfun); h = plot(args{:}); set(h([1,2]),'LineStyle','--'); set(h([1,3]),'Color','k'); set(h([2,4]),'Color','r'); legend('Dynamic Allocation Start','Pre-Allocation Start','Dynamic Allocation End','Pre-Allocation End','Location','Northwest'); grid('on'); axis([nIndex(1),nIndex(end),0.95*min([mem{:}]),1.05*max([mem{:}])]); xlabel('Number of Array Elements [-]'); ylabel('Matlab Memory Usage [MB]'); end function y = burden(x) y = besselj(0,x); end function mem = getMemory() mem = memory(); mem = mem.MemUsedMATLAB/1E6; %[MB] end function [y,mstart,mend] = forDynamic(x) mstart = getMemory(); n = numel(x) ; for k = 1:n y(k) = burden(x(k)); end mend = getMemory(); end function [y,mstart,mend] = forAllocate(x) mstart = getMemory(); n = numel(x) ; y(1,n) = 0 ; for k = 1:numel(x) y(k) = burden(x(k)); end mend = getMemory(); end function [y,mstart,mend] = whileDynamic(x) mstart = getMemory(); n = numel(x) ; k = 1 ; while k <= n y(k) = burden(x(k)) ; k = k + 1 ; end mend = getMemory(); end function [y,mstart,mend] = whileAllocate(x) mstart = getMemory(); n = numel(x) ; k = 1 ; y(1,n) = 0 ; while k <= n y(k) = burden(x(k)) ; k = k + 1 ; end mend = getMemory(); end

+2

Troyhaskin 25 sept. '15 at 17:53

source share

matlabgui · Accepted Answer · 2015-09-24T16:57:14+0000

I recall that at the Matlab Expo several years ago, they talked about things that were being developed at headquarters, one of which was automatic pre-allocation of memory.

There was no mention of when it could be released, or even if it would ever be ... And I never heard of it, since ...

From my experience - I always controlled the dynamic allocation myself - and always noticed a serious slowdown if this part of the code had problems (i.e. arrays grew when I thought they shouldn't be ...)

Therefore, I think it's fair to say that you need to manage it yourself.

How does MATLAB handle dynamic array distribution?

More articles: