According to a comment by @tholy, the amazing thing about julia is that the built-in functions are not special and are not faster than custom ones. They are both fast. I changed your function and I got it to execute in much the same way as the built-in cumsum :
function testA!(arr) @inbounds for i in 1:size(arr, 2) tmp = arr[1, i] for k in 2:size(arr,1) tmp += arr[k, i] arr[k,i] = tmp end end arr end function testB!(arr) cumsum!(arr, arr) end
I built test arrays:
arr = rand(1:100, 10^5, 10^2) arr2 = copy(arr)
and I got the following timings:
@time testA!(arr) 0.007645 seconds (4 allocations: 160 bytes) @time testB!(arr2) 0.007704 seconds (4 allocations: 160 bytes)
which are basically equivalent.
source share