tf.gradients has a grad_ys parameter that can be used for this purpose. Suppose your network has only one relu layer as follows:
before_relu = f1(inputs, params) after_relu = tf.nn.relu(before_relu) loss = f2(after_relu, params, targets)
First, calculate the derivative before after_relu .
Dafter_relu = tf.gradients(loss, after_relu)[0]
Then set the thresholds of your gradients that you send.
Dafter_relu_thresholded = tf.select(Dafter_relu < 0.0, 0.0, Dafter_relu)
Compute actual wrt gradients to params .
Dparams = tf.gradients(after_relu, params, grad_ys=Dafter_relu_thresholded)
You can easily extend the same method for a network with many relu layers.
keveman
source share