I am trying to parallelize a small scientific code that I wrote. But when I add @parallelize, similar code on only one processor suddenly takes 10 times as long to execute. It takes about the same amount of time. The first code does one memory allocation, and the second zeros(Float64, num_bins)does 20. But it should not be a bottleneck. num_binsequals 1800. Thus, each call zeros()should allocate 8 * 1800 bytes. 20 calls to allocate 14,400 bytes should not last so long.
I can't understand what I'm doing wrong, and Julia's documentation is vague and non-specific about how variables are available in @parallel. Both versions of the code below calculate the correct value for the vector rdf. Can someone tell me, looking at him, what makes him allocate so much memory for so long?
atoms = readAtoms(file)
rdf = zeros(Float64, num_bins)
@time for k = 1:20
for i = 1:num_atoms
for j = 1:num_atoms
r = distance(k, atoms, i, atoms, j)
bin_number = floor(r / dr) + 1
rdf[bin_number] += 1
end
end
end
Elapsed time: 8.1 seconds (0 bytes allocated)
atoms = readAtoms(file)
@time rdf = @parallel (+) for k = 1:20
rdf_part = zeros(Float64, num_bins)
for i = 1:num_atoms
for j = 1:num_atoms
r = distance(k, atoms, i, atoms, j)
bin_number = floor(r / dr) + 1
rdf_part[bin_number] += 1
end
end
rdf_part
end
elapsed time: 81.2 seconds (33472513332 bytes allocated, 17.40% of the gc time)
source
share