Consider the following code snippet:
x = Float64[1:10000000]; y = Array(Float64, length(x)); function nonglobal_devec!(x,y) for i = 1:length(x) y[i] = exp(x[i]) end end function nonglobal_vec(x) exp(x) end @time nonglobal_devec!(x,y); @time y = nonglobal_vec(x); x = Float64[1:10000000]; y = Array(Float64, length(x)); @time for i = 1:length(x) y[i] = exp(x[i]) end @time y = exp(x)
which gives times
A: elapsed time: 0.072701108 seconds (115508 bytes allocated) B: elapsed time: 0.074584697 seconds (80201532 bytes allocated) C: elapsed time: 2.029597656 seconds (959990464 bytes allocated, 22.86% gc time) D: elapsed time: 0.058509661 seconds (80000128 bytes allocated)
Odd, C, due to its work in a global area where type inference doesn't work, and slower code.
The relative intervals between A and B are subject to some variability due to compiled functions on first use. If we run it again, we get
A2: elapsed time: 0.038542212 seconds (80 bytes allocated) B2: elapsed time: 0.063630172 seconds (80000128 bytes allocated)
which makes sense, since A2 does not allocate memory (80 bytes for the return value of the function), and B2 creates a new vector. Also note that B2 allocates the same amount of memory as D - the first time it was memory allocated for compilation.
Finally, detectionism against promissory notes in each case. For example, if you implemented matrix multiplication naively with loops and did not know about caching, you are likely to be much slower than using vectorized A*b using BLAS.
Iaindunning
source share