Is there a way to crop intermediate blurry gradients in a tensor stream

Problem: Very Long RNN Network

N1 -- N2 -- ... --- N100 

For an optimizer such as AdamOptimizer , compute_gradient() will give gradients to all learning variables.

However, it can explode at a certain step.

A method like how-to-effectively-apply-gradient-clipping-in-tensor-flow can fix a large final gradient.

But how to copy these intermediate?

One way could be to manually make a backprop with β€œN100 β†’ N99”, click on the gradients, then β€œN99 β†’ N98”, etc., but it's too complicated.

So my question is: is there an easier way to crop intermediate gradients? (of course, strictly speaking, they are no longer mathematical gradients)

+6
source share
1 answer

You can use the custom_gradient decorator to make a version of tf.identity that will copy intermediate blurry gradients.

`` `from tensorflow.contrib.eager.python import tfe

@ tfe.custom_gradient def gradient_clipping_identity (tensor, max_norm): result = tf.identity (tensor)

def grad (dresult): return tf.clip_by_norm (dresult, max_norm), None

return result, grad,,,

Then use gradient_clipping_identity , as you usually use the identifier, and your gradients will be clipped in the backward pass.

0
source

All Articles