A faster way to initialize arrays using empty matrix multiplication? (Matlab)

I came across a strange way (in my opinion) that Matlab deals with empty matrices . For example, if two empty matrices are multiplied, the result:

zeros(3,0)*zeros(0,3) ans = 0 0 0 0 0 0 0 0 0 

Now it already took me by surprise, however, a quick search drew me to the link above, and I got an explanation of the somewhat distorted logic why this happens.

However , nothing prepared me for the next observation. I asked myself: how effective is this type of multiplication and just use the zeros(n) function, for example, to initialize? I used timeit to answer this:

 f=@() zeros(1000) timeit(f) ans = 0.0033 

against

 g=@() zeros(1000,0)*zeros(0,1000) timeit(g) ans = 9.2048e-06 

Both have the same matrix result 1000x1000 zeros of class double , but the empty matrix is ​​multiplied by ~ 350 times faster! (a similar result occurs using tic and toc and the loop)

How can it be? timeit or tic,toc bluff or did I find a faster way to initialize matrices? (this was done using matlab 2012a, on a win7-64 machine, intel-i5 650 3.2Ghz ...)

EDIT:

After reading your reviews, I examined this feature more carefully and tested on two different computers (the same matlab ver, although 2012a) a code that checks the runtime and matrix size n. This is what I get:

enter image description here

The code for generating this used timeit is as before, but the loop with tic and toc will look the same. Thus, for small sizes zeros(n) comparable. However, around n=400 there is a performance jump for empty matrix multiplication. The code I used to create this plot was:

 n=unique(round(logspace(0,4,200))); for k=1:length(n) f=@() zeros(n(k)); t1(k)=timeit(f); g=@() zeros(n(k),0)*zeros(0,n(k)); t2(k)=timeit(g); end loglog(n,t1,'b',n,t2,'r'); legend('zeros(n)','zeros(n,0)*zeros(0,n)',2); xlabel('matrix size (n)'); ylabel('time [sec]'); 

Do you also experience this?

EDIT # 2:

By the way, empty matrix multiplication is not required to obtain this effect. You can simply do:

 z(n,n)=0; 

where n> is a certain threshold matrix size, considered in the previous graph, and get the efficiency profile accurate , as with empty matrix multiplication (again using timeit).

enter image description here

Here's an example where it improves code efficiency:

 n = 1e4; clear z1 tic z1 = zeros( n ); for cc = 1 : n z1(:,cc)=cc; end toc % Elapsed time is 0.445780 seconds. %% clear z0 tic z0 = zeros(n,0)*zeros(0,n); for cc = 1 : n z0(:,cc)=cc; end toc % Elapsed time is 0.297953 seconds. 

However, the use of z(n,n)=0; leads to similar results in the case of zeros(n) .

+56
performance initialization matrix-multiplication matlab
Jan 05 '13 at 6:15
source share
5 answers

This is strange, I see that f is faster and g is slower than you see. But both of them are the same for me. Perhaps another version of MATLAB?

 >> g = @() zeros(1000, 0) * zeros(0, 1000); >> f = @() zeros(1000) f = @()zeros(1000) >> timeit(f) ans = 8.5019e-04 >> timeit(f) ans = 8.4627e-04 >> timeit(g) ans = 8.4627e-04 

EDIT , you can add + 1 for the end of f and g and see how much time you get.

EDIT January 6, 2013 7:42 EST

I use the machine remotely, so sorry for the low-quality graphics (they had to be hidden).

Machine configuration:

i7 920.2653 GHz. Linux 12 GB of RAM. Cache 8 MB.

Graph generated on i7 920

It seems that even the machine to which I have access shows this behavior, except for a larger size (somewhere between 1979 and 2073). There is no reason why I can now think that an empty matrix multiplies faster at large sizes.

I will study a little more before returning.

EDIT January 11, 2013

After posting @EitanT, I wanted to work a little. I wrote some C code to see how Matlab can create a matrix of zeros. Here is the C ++ code I used.

 int main(int argc, char **argv) { for (int i = 1975; i <= 2100; i+=25) { timer::start(); double *foo = (double *)malloc(i * i * sizeof(double)); for (int k = 0; k < i * i; k++) foo[k] = 0; double mftime = timer::stop(); free(foo); timer::start(); double *bar = (double *)malloc(i * i * sizeof(double)); memset(bar, 0, i * i * sizeof(double)); double mmtime = timer::stop(); free(bar); timer::start(); double *baz = (double *)calloc(i * i, sizeof(double)); double catime = timer::stop(); free(baz); printf("%d, %lf, %lf, %lf\n", i, mftime, mmtime, catime); } } 

Here are the results.

 $ ./test 1975, 0.013812, 0.013578, 0.003321 2000, 0.014144, 0.013879, 0.003408 2025, 0.014396, 0.014219, 0.003490 2050, 0.014732, 0.013784, 0.000043 2075, 0.015022, 0.014122, 0.000045 2100, 0.014606, 0.014480, 0.000045 

As you can see, calloc (4th column) seems to be the fastest method. It also accelerates significantly between 2025 and 2050 (suppose this is around 2048?).

Now I am back at Matlab to check the same thing. Here are the results.

 >> test 1975, 0.003296, 0.003297 2000, 0.003377, 0.003385 2025, 0.003465, 0.003464 2050, 0.015987, 0.000019 2075, 0.016373, 0.000019 2100, 0.016762, 0.000020 

It appears that both f () and g () use calloc at smaller sizes (<2048?). But with large sizes, f () (zeros (m, n)) starts using malloc + memset , and g () (zeros (m, 0) * zeros (0, n)) continues to use calloc .

Thus, the discrepancy is explained by the following

  • zeros (..) begins to use a different (slower?) scheme with larger sizes.
  • calloc also behaves somewhat unexpectedly, which leads to improved performance.

This is behavior on Linux. Can someone do the same experiment on another machine (and possibly on a different OS) and see if there is an experiment?

+32
Jan 05 '13 at 7:05
source share

The results can be a little misleading. When you multiply two empty matrices, the resulting matrix is ​​not immediately "distributed" and "initialized", rather it is delayed until you use it first (sort of like a lazy estimate).

The same applies to indexing outside of a grow variable, which in the case of numeric arrays fills any missing entries with zeros (I will then discuss the non-numeric case). Of course, matrix growth in this way does not overwrite existing elements.

This way, although it might seem faster, you just delay the distribution time until you actually start using the matrix. In the end you will have similar timings, as if you made a selection from the very beginning.

An example to show this behavior, compared to several other alternatives :

 N = 1000; clear z tic, z = zeros(N,N); toc tic, z = z + 1; toc assert(isequal(z,ones(N))) clear z tic, z = zeros(N,0)*zeros(0,N); toc tic, z = z + 1; toc assert(isequal(z,ones(N))) clear z tic, z(N,N) = 0; toc tic, z = z + 1; toc assert(isequal(z,ones(N))) clear z tic, z = full(spalloc(N,N,0)); toc tic, z = z + 1; toc assert(isequal(z,ones(N))) clear z tic, z(1:N,1:N) = 0; toc tic, z = z + 1; toc assert(isequal(z,ones(N))) clear z val = 0; tic, z = val(ones(N)); toc tic, z = z + 1; toc assert(isequal(z,ones(N))) clear z tic, z = repmat(0, [NN]); toc tic, z = z + 1; toc assert(isequal(z,ones(N))) 

The result shows that if you summarize the elapsed time for both teams in each case, you will get similar general timings:

 // zeros(N,N) Elapsed time is 0.004525 seconds. Elapsed time is 0.000792 seconds. // zeros(N,0)*zeros(0,N) Elapsed time is 0.000052 seconds. Elapsed time is 0.004365 seconds. // z(N,N) = 0 Elapsed time is 0.000053 seconds. Elapsed time is 0.004119 seconds. 

Other timings:

 // full(spalloc(N,N,0)) Elapsed time is 0.001463 seconds. Elapsed time is 0.003751 seconds. // z(1:N,1:N) = 0 Elapsed time is 0.006820 seconds. Elapsed time is 0.000647 seconds. // val(ones(N)) Elapsed time is 0.034880 seconds. Elapsed time is 0.000911 seconds. // repmat(0, [NN]) Elapsed time is 0.001320 seconds. Elapsed time is 0.003749 seconds. 

These measurements are too small in milliseconds and may not be very accurate, so you can run these commands several thousand times in a loop and take an average value. Also, sometimes the execution of stored M-functions is faster than running scripts or the command line, as certain optimizations only happen this way ...

In any case, the selection is usually performed once, so who needs it, if an additional 30 ms is required :)




Similar behavior can be observed with arrays of cells or arrays of structures. Consider the following example:

 N = 1000; tic, a = cell(N,N); toc tic, b = repmat({[]}, [N,N]); toc tic, c{N,N} = []; toc 

which gives:

 Elapsed time is 0.001245 seconds. Elapsed time is 0.040698 seconds. Elapsed time is 0.004846 seconds. 

Please note that even if they are all equal, they occupy a different amount of memory:

 >> assert(isequal(a,b,c)) >> whos abc Name Size Bytes Class Attributes a 1000x1000 8000000 cell b 1000x1000 112000000 cell c 1000x1000 8000104 cell 

Actually, the situation here is a little more complicated, since MATLAB probably shares the same empty matrix for all cells, rather than creating multiple copies.

The array of cells a is actually an array of uninitialized cells (an array of NULL pointers), and b is an array of cells, where each cell is an empty array [] (inside and due to data sharing, only the first cell b{1} points to [] , and everyone else has a link to the first cell). The final array c is similar to a (uninitialized cells), but with the last one containing an empty numeric matrix [] .




I looked at the list of exported C functions from libmx.dll (using the Dependency Walker tool ), and I found some interesting things.

  • There are undocumented functions for creating uninitialized arrays: mxCreateUninitDoubleMatrix , mxCreateUninitNumericArray and mxCreateUninitNumericMatrix . In fact, there is an idea of File Exchange using these functions to provide a faster alternative to the zeros function.

  • There is an undocumented function called mxFastZeros . Googling online, I can see that you cross-asked this question on MATLAB Answers, as well as excellent answers there. James Turs (the same author of UNINIT from earlier) gave an example on how to use this undocumented function.

  • libmx.dll is associated with the tbbmalloc.dll shared library. This is a scalable Intel TBB memory allocator . This library provides equivalent memory allocation functions ( malloc , calloc , free ), optimized for parallel applications. Remember that many MATLAB functions are automatically multithreaded , so I won’t be surprised if zeros(..) multithreaded and uses the Intel memory allocator as soon as the matrix size is large enough (here is a recent comment by Loren Shure that confirms this fact).

Regarding the last point about the memory allocator, you can write a similar test in C / C ++, similar to what was @PavanYalamanchili , and compare the available allocators. Something like this . Remember that MEX files have slightly higher memory management overhead because MATLAB automatically frees up any memory that was allocated in MEX files using the mxCalloc , mxMalloc or mxRealloc . For what it was worth, in older versions it was previously possible to change the internal memory manager .




EDIT:

Here is a more thorough criterion for comparing the alternatives discussed. It specifically shows that as soon as you emphasize the use of the entire selected matrix, all three methods are on an equal footing, and the difference is negligible.

 function compare_zeros_init() iter = 100; for N = 512.*(1:8) % ZEROS(N,N) t = zeros(iter,3); for i=1:iter clear z tic, z = zeros(N,N); t(i,1) = toc; tic, z(:) = 9; t(i,2) = toc; tic, z = z + 1; t(i,3) = toc; end fprintf('N = %4d, ZEROS = %.9f\n', N, mean(sum(t,2))) % z(N,N)=0 t = zeros(iter,3); for i=1:iter clear z tic, z(N,N) = 0; t(i,1) = toc; tic, z(:) = 9; t(i,2) = toc; tic, z = z + 1; t(i,3) = toc; end fprintf('N = %4d, GROW = %.9f\n', N, mean(sum(t,2))) % ZEROS(N,0)*ZEROS(0,N) t = zeros(iter,3); for i=1:iter clear z tic, z = zeros(N,0)*zeros(0,N); t(i,1) = toc; tic, z(:) = 9; t(i,2) = toc; tic, z = z + 1; t(i,3) = toc; end fprintf('N = %4d, MULT = %.9f\n\n', N, mean(sum(t,2))) end end 

Below are time intervals averaged over 100 iterations in terms of increasing the size of the matrix. I performed tests in R2013a.

 >> compare_zeros_init N = 512, ZEROS = 0.001560168 N = 512, GROW = 0.001479991 N = 512, MULT = 0.001457031 N = 1024, ZEROS = 0.005744873 N = 1024, GROW = 0.005352638 N = 1024, MULT = 0.005359236 N = 1536, ZEROS = 0.011950846 N = 1536, GROW = 0.009051589 N = 1536, MULT = 0.008418878 N = 2048, ZEROS = 0.012154002 N = 2048, GROW = 0.010996315 N = 2048, MULT = 0.011002169 N = 2560, ZEROS = 0.017940950 N = 2560, GROW = 0.017641046 N = 2560, MULT = 0.017640323 N = 3072, ZEROS = 0.025657999 N = 3072, GROW = 0.025836506 N = 3072, MULT = 0.051533432 N = 3584, ZEROS = 0.074739924 N = 3584, GROW = 0.070486857 N = 3584, MULT = 0.072822335 N = 4096, ZEROS = 0.098791732 N = 4096, GROW = 0.095849788 N = 4096, MULT = 0.102148452 
+29
Aug 13 '13 at 19:51 on
source share

After some research, I found this article in "Undocumented Matlab" in which Mr. Yair Altman has already concluded that the MathWork matrix prevailing method using zeros(M, N) really not the most efficient way.

He timed x = zeros(M,N) against clear x, x(M,N) = 0 and found that the latter is about 500 times faster. According to his explanation, the second method simply creates an M-by-N matrix whose elements are automatically initialized to 0. The first method, however, creates x (with x with automatic zero elements), and then assigns zero to each element in x again, and it is a redundant operation that takes longer.

In the case of an empty matrix multiplication, such as what you indicated in your question, MATLAB expects the product to be an M × N matrix, and therefore it allocates an M × N matrix. Therefore, the output matrix is ​​automatically initialized to zeros. Since the original matrices are empty, further calculations are not performed, and therefore, the elements in the output matrix remain unchanged and equal to zero.

+26
Jan 07 '13 at
source share

An interesting question, apparently, there are several ways to “beat” the built-in zeros function. My only suggestion as to why this is happening would be that it might be more memory efficient (after all, zeros(LargeNumer) would rather cause Matlab to reach the memory limit than it would create a bottleneck in most codes) or more robust.

Here is another quick highlight method using a sparse matrix, I added a regular null function as a reference:

 tic; x=zeros(1000,1000); toc Elapsed time is 0.002863 seconds. tic; clear x; x(1000,1000)=0; toc Elapsed time is 0.000282 seconds. tic; x=full(spalloc(1000,1000,0)); toc Elapsed time is 0.000273 seconds. tic; x=spalloc(1000,1000,1000000); toc %Is this the same for practical purposes? Elapsed time is 0.000281 seconds. 
+2
Jan 11 '13 at 10:23
source share

This answer is fake. Hold on to the record. It seems that Matlab decides to use sparse matrices when he received the command as z(n,n)=0; when n is large enough. The internal implementation may be in the form of a condition like: (if newsize> THRESHOLD + oldsize, then use sparse ...), where THRESHOLD is your "threshold size".

However, this is despite the fact that Mathworks claims: “Matlab never creates sparse matrices automatically” ( link )

I don't have matlab to check this out. However, the use of sparse matrices (internally) is a shorter explanation (Occam's razor), therefore, it is better to falsify.

-3
May 6 '15 at 16:35
source share



All Articles