I have a network that trains the batch layer. The batch size is 16, so I have to use several GPUs. I followed the example of inceptionv3 , which can be summarized as
with tf.Graph().as_default(), tf.device('/cpu:0'): images_splits = tf.split(axis=0, num_or_size_splits=FLAGS.num_gpus, value=images) labels_splits = tf.split(axis=0, num_or_size_splits=FLAGS.num_gpus, value=labels) for i in range(FLAGS.num_gpus): with tf.device('/gpu:%d' % i): with tf.name_scope('%s_%d' % (inception.TOWER_NAME, i)) as scope: ...
Unfortunately, he used a thin library for the BN layer, while I used the standard BN tf.contrib.layers.batch_norm
def _batch_norm(self, x, name, is_training, activation_fn, trainable=False): with tf.variable_scope(name+'/BatchNorm') as scope: o = tf.contrib.layers.batch_norm( x, scale=True, activation_fn=activation_fn, is_training=is_training, trainable=trainable, scope=scope) return o
To collect mov_mean and moving_variance, I used tf.GraphKeys.UPDATE_OPS
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): self.train_op = tf.group(train_op_conv, train_op_fc)
Finally, the idea of ββusing BN in multiple GPUs can borrow from inceptionv3 as
split_image_batch = tf.split(self.image_batch, self.conf.num_gpus, 0) split_label_batch = tf.split(self.label_batch, self.conf.num_gpus, 0) global_step = tf.train.get_or_create_global_step() opt= tf.train.MomentumOptimizer(self.learning_rate, self.conf.momentum) tower_grads_encoder = [] tower_grads_decoder = [] update_ops=[] with tf.variable_scope(tf.get_variable_scope()): for i in range(self.conf.num_gpus): with tf.device('/gpu:%d' % i): net = Resnet(split_image_batch[i], self.conf.num_classes)
Although the code worked without errors, the performance is very low. It seems that I did not correctly collect the BN parameters. Could you take a look at my code and give me some direction for learning BN in multiple GPUs? Thanks