NnGraph multi-GPU Torch

Question

NnGraph multi-GPU Torch

This question is about creating an nnGraph network on multiple GPUs and not related to the next network instance.

I am trying to train a network that is built using nnGraph. The reverse diagram is attached. I am trying to run parallelModel (see code or Figure Node 9) in a multi-GPU setup. If I attach a parallel model to the nn.Sequential container and then create a DataParallelTable, it works in configuration with several GPUs (without nnGraph). However, after joining nnGraph, I get an error. The return pass works if I train on the same GPU (setting true to false in the if statements), but in the setup with several GPUs I get the error message "gmodule.lua: 418: attempt to index the local" gradInput "(zero cost)". I think Node 9 in the backdoor should run on multiple GPUs, however this does not happen. Creating a DataParallelTable on nnGraph did not work for me, however I thought that at least working with internal serial networks in DataParallelTable would work. Is there any other way to split the initial data that is transferred to nnGraph so that it runs on multiple GPUs?

require 'torch' require 'nn' require 'cudnn' require 'cunn' require 'cutorch' require 'nngraph' data1 = torch.ones(4,20):cuda() data2 = torch.ones(4,10):cuda() tmodel = nn.Sequential() tmodel:add(nn.Linear(20,10)) tmodel:add(nn.Linear(10,10)) parallelModel = nn.ParallelTable() parallelModel:add(tmodel) parallelModel:add(nn.Identity()) parallelModel:add(nn.Identity()) model = parallelModel if true then local function sharingKey(m) local key = torch.type(m) if m.__shareGradInputKey then key = key .. ':' .. m.__shareGradInputKey end return key end -- Share gradInput for memory efficient backprop local cache = {} model:apply(function(m) local moduleType = torch.type(m) if torch.isTensor(m.gradInput) and moduleType ~= 'nn.ConcatTable' then local key = sharingKey(m) if cache[key] == nil then cache[key] = torch.CudaStorage(1) end m.gradInput = torch.CudaTensor(cache[key], 1, 0) end end) end if true then cudnn.fastest = true cudnn.benchmark = true -- Wrap the model with DataParallelTable, if using more than one GPU local gpus = torch.range(1, 2):totable() local fastest, benchmark = cudnn.fastest, cudnn.benchmark local dpt = nn.DataParallelTable(1, true, true) :add(model, gpus) :threads(function() local cudnn = require 'cudnn' cudnn.fastest, cudnn.benchmark = fastest, benchmark end) dpt.gradInput = nil model = dpt:cuda() end newmodel = nn.Sequential() newmodel:add(model) input1 = nn.Identity()() input2 = nn.Identity()() input3 = nn.Identity()() out = newmodel({input1,input2,input3}) r1 = nn.NarrowTable(1,2)(out) r2 = nn.NarrowTable(2,2)(out) f1 = nn.JoinTable(2)(r1) f2 = nn.JoinTable(2)(r2) n1 = nn.Sequential() n1:add(nn.Linear(20,5)) n2 = nn.Sequential() n2:add(nn.Linear(20,5)) f11 = n1(f1) f12 = n2(f2) foutput = nn.JoinTable(2)({f11,f12}) g = nn.gModule({input1,input2,input3},{foutput}) g = g:cuda() g:forward({data1, data2, data2}) g:backward({data1, data2, data2}, torch.rand(4,10):cuda())

Code in if statements taken from Implementing Facebook ResNet

+7

deep-learning torch multi-gpu

Bharat Jul 26 '16 at 18:47

source share

No one has answered this question yet.

See related questions:

2

Problems with Lua and Torch with GPu

2

Normalizing a batch with multiple GPUs in the torch

one

can't run torch code on gpu

one

torch train with gpu correctly always the same

one

Keras Multi GPU example gives ResourceExhaustedError

0

Realization of a torch of a neural network with several exits

0

Pytorch using Multi GPU / accuracy too low (10%)

0

torch: GPU / CPU response difference

0

Torch: my user layer does not work on the GPU

NnGraph multi-GPU Torch

More articles: