Function call in @parallel leads to huge memory allocations

I created a minimal working example of my previous question ( Julia allocates a huge amount of memory for an unknown reason ), isolating the problem. This can be directly tested in REPL. Consider the code:

function test1(n)
    s = zero(Float64)
    for i = 1:10^n
        s += sqrt(rand()^2 + rand()^2 + rand()^2)
    end
    return s
end

-

function test2(n)
    @parallel (+) for i = 1:10^n
        sqrt(rand()^2 + rand()^2 +rand()^2)
    end
end

-

function test3(n)
    function add(one, two, three)
        one + two + three
    end

    @parallel (+) for i = 1:10^n
        sqrt(add(rand()^2, rand()^2, rand()^2))
    end
end

Then I check the code:

@time test1(8);
@time test1(8);

@time test2(8);
@time test2(8);

@time test3(8);
@time test3(8);

And here is the conclusion:

elapsed time: 1.017241708 seconds (183868 bytes allocated)
elapsed time: 1.033503964 seconds (96 bytes allocated)

elapsed time: 1.214897591 seconds (3682220 bytes allocated)
elapsed time: 1.020521156 seconds (2104 bytes allocated)

elapsed time: 15.23876415 seconds (9600679268 bytes allocated, 26.69% gc time)
elapsed time: 15.418865707 seconds (9600002736 bytes allocated, 26.19% gc time)

Can someone explain:

  • Why does the first run of each function allocate so much memory?
  • Why is the memory allocated in test2(8)higher than test1(8)? They do the same thing.
  • Most importantly, what happens to test3(8)? He allocates huge amounts of memory.

EDIT:

Julia Version 0.3.1
Commit c03f413* (2014-09-21 21:30 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.3.0)
  CPU: Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3
+4
source share
1 answer

- : , julia- JIT julia, , ( , ), , , .

test2, test3 50 . 2- ( julia -p 2).

, , , , @parallel. "thunk" . thunk , , .

+2

All Articles