I am interested in using Julia SharedArray for a scientific computer project. My current implementation calls on BLAS for all matrix vector operations, but I thought that maybe SharedArray offer some speedup on multi-core machines. My idea is to simply update the output index by index, farming index updates for workflows.
Previous discussions here about SharedArray and here about shared memory objects have not given clear guidance on this issue. It seems intuitively simple, but after testing I am somewhat confused why this approach works so poorly (see code below). For starters, it seems that @parallel for allocates a lot of memory. And if I prefix the loop using @sync , which seems reasonable, if the entire output vector is required later, then the parallel loop will be significantly slower (although without @sync , the loop is very fast).
Am I misinterpreting the correct use of the SharedArray object? Or maybe I have inefficiently assigned calculations?
Exiting the test with 4 workers + 1 master node:
elapsed time: 0.109010169 seconds (80 bytes allocated) elapsed time: 0.110858551 seconds (80 bytes allocated) true elapsed time: 1.726231048 seconds (119936 bytes allocated) true
source share