I wrote an algorithm using the tensorflow framework and ran into the problem that tf.train.Optimizer.compute_gradients(loss) returns zero for all weights. Another problem is that if the packet size is greater than about 5, tf.histogram_summary for the weights, an error is tf.histogram_summary according to which some values ββare NaN.
I cannot present a reproducible example here because my code is rather bulky and I am not so good at TF to make it shorter. I will try to insert some fragments here.
The main loop:
images_ph = tf.placeholder(tf.float32, shape=some_shape) labels_ph = tf.placeholder(tf.float32, shape=some_shape) output = inference(BATCH_SIZE, images_ph) loss = loss(labels_ph, output) train_op = train(loss, global_step) session = tf.Session() session.run(tf.initialize_all_variables()) for i in xrange(MAX_STEPS): images, labels = train_dataset.get_batch(BATCH_SIZE, yolo.INPUT_SIZE, yolo.OUTPUT_SIZE) session.run([loss, train_op], feed_dict={images_ph : images, labels_ph : labels})
Train_op (there is a problem here):
def train(total_loss) opt = tf.train.AdamOptimizer() grads = opt.compute_gradients(total_loss) # Here gradients are zeros for grad, var in grads: if grad is not None: tf.histogram_summary("gradients/" + var.op.name, grad) return opt.apply_gradients(grads, global_step=global_step)
Loss (loss is calculated correctly, as it varies from sample to sample):
def loss(labels, output) return tf.reduce_mean(tf.squared_difference(labels, output))
Conclusion: a set of convolution layers with ReLU, followed by 3 fully related layers with sigmoid activation in the last layer. All weights are initialized by truncated normal rv. All labels are vectors of fixed length with real numbers in the range [0,1] .
Thanks in advance for your help! If you have a hypothesis for my problem, share it, I will try them. I can also share all the code if you want.