Julia: efficiently parallelizing the difference matrix calculation

Julia supports parallelization using methods such as pmat () and @parallel.

I am trying to calculate the dissimilarity matrix for a dataset:

n = length(dataset) mat = zeros(n,n) for i = 1 : n for j = i+1 : n mat[i,j] = mat[j,i] = f_dist(dataset[i], dataset[j]) end end 

Since the calculations are independent, I believe this is a good candidate for parallel computing.

My attempt to use pmat () and @parallel both ended more slowly.

 mat = @parallel (+) for comb in collect(combinations([1:n],2)) submat = zeros(n,n) i = comb[1] j = comb[2] dist = f_dist(dataset[i],dataset[j]) submat[i,j] = dist submat[j,i] = dist submat end 

I understand that @parallel is a bad way to go because I essentially create a bunch of sparse matrices and add them together. Very inefficient.

Is there an effective way to make this work? I tried SharedArrays and DistributedArrays, but didn't understand how to do what I want.

Thanks.

+4
source share
1 answer

With SharedArrays, this should be pretty simple: this code is right out of my head and not tested, but something like

 mat = SharedArray(Float64, n, n) combs = collect(combinations([1:n],2)) chunkbreaks = linspace(0, length(combs), nworkers()+1) @sync begin for (i,wpid) in enumerate(workers()) @async begin remotecall_wait(wpid, myfunc, mat, combs[chunkbreaks[i]+1:chunkbreaks[i+1]]) end end end 

where myfunc performs calculations on the specified mat indices.

I must add, however, that if your calculation of the dissimilarity is slow, serializing combs can make it slower than the single-threaded version. You can fix this by proposing a much faster coding strategy for the indices assigned to each process (it can only be encoded using UnitRange{Int} and using divrem to define indices i,j ).

+4
source

Source: https://habr.com/ru/post/1212776/


All Articles