I want to parallelize the simple following expression on 2 GPUs: C = A^n + B^nby computing A^non GPU 0 and B^non GPU 1 before summing the results.
In TensorFlow, I would like to:
with tf.device('/gpu:0'):
An = matpow(A, n)
with tf.device('/gpu:1'):
Bn = matpow(B, n)
with tf.Session() as sess:
C = sess.run(An + Bn)
However, since PyTorch is dynamic, I am having problems with the same. I tried the following, but it takes longer.
with torch.cuda.device(0):
A = A.cuda()
with torch.cuda.device(1):
B = B.cuda()
C = matpow(A, n) + matpow(B, n).cuda(0)
I know that there is a module for parallelizing models to the batch size using torch.nn.DataParallel, but here I am trying to do something more basic.
source
share