Gradients are always zero.

Question

Gradients are always zero.

I wrote an algorithm using the tensorflow framework and ran into the problem that tf.train.Optimizer.compute_gradients(loss) returns zero for all weights. Another problem is that if the packet size is greater than about 5, tf.histogram_summary for the weights, an error is tf.histogram_summary according to which some values are NaN.

I cannot present a reproducible example here because my code is rather bulky and I am not so good at TF to make it shorter. I will try to insert some fragments here.

The main loop:

 images_ph = tf.placeholder(tf.float32, shape=some_shape) labels_ph = tf.placeholder(tf.float32, shape=some_shape) output = inference(BATCH_SIZE, images_ph) loss = loss(labels_ph, output) train_op = train(loss, global_step) session = tf.Session() session.run(tf.initialize_all_variables()) for i in xrange(MAX_STEPS): images, labels = train_dataset.get_batch(BATCH_SIZE, yolo.INPUT_SIZE, yolo.OUTPUT_SIZE) session.run([loss, train_op], feed_dict={images_ph : images, labels_ph : labels})

Train_op (there is a problem here):

 def train(total_loss) opt = tf.train.AdamOptimizer() grads = opt.compute_gradients(total_loss) # Here gradients are zeros for grad, var in grads: if grad is not None: tf.histogram_summary("gradients/" + var.op.name, grad) return opt.apply_gradients(grads, global_step=global_step)

Loss (loss is calculated correctly, as it varies from sample to sample):

 def loss(labels, output) return tf.reduce_mean(tf.squared_difference(labels, output))

Conclusion: a set of convolution layers with ReLU, followed by 3 fully related layers with sigmoid activation in the last layer. All weights are initialized by truncated normal rv. All labels are vectors of fixed length with real numbers in the range [0,1] .

Thanks in advance for your help! If you have a hypothesis for my problem, share it, I will try them. I can also share all the code if you want.

+5

tensorflow tensorboard

nmerci Jun 28 '16 at 21:11

source share

No one has answered this question yet.

See similar questions:

1

The loss of tensor loss does not change, as well as the calculated gradients and the applied rate of application, but still the loss does not change?

or similar:

3

Tensorflow: opt.compute_gradients () returns values other than the difference in weight opt.apply_gradients ()