How to reduce memory consumption in a loop in TensorFlow?

I have a loop in TensorFlow that looks like this:

with tf.device("/gpu:1"): losses = [] for target, output in zip(targets, lstm_outputs): logits = tf.matmul(W, output) + b loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, target) losses.append(loss) total_loss = tf.add_n(losses) 

I get an OOM error when distributing gradients for this level, since each matrix multiplication is a different operation in a graph that takes memory. Is there a way to prevent TensorFlow from simultaneously distributing all of these operations?

+6
source share
1 answer

This is a complex graph for optimizing TensorFlow, since the activation of each layer must be stored to combine one gradient for W One possibility is to pass an experimental aggregation_method argument when optimizer.optimize() called.

For example, you can try the following:

 optimizer = tf.train.AdagradOptimizer(...) # Or another optimization algorithm. train_op = optimizer.minimize( total_loss, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) 

This parameter readily collects gradients for constantly used variables instead of storing them all in memory until all gradients are computed. If this does not work, tf.AggregationMethod.EXPERIMENTAL_TREE may work better.

+7
source

All Articles