What is the difference in updating the pulse gradient in Tensorflow and Theano?

Question

What is the difference in updating the pulse gradient in Tensorflow and Theano?

I am trying to use TensorFlow with my deep learning project.
Here I need to implement my gradient update in this formula:

I also implemented this part in Theano, and it came out of the expected answer. But when I try to use TensorFlow MomentumOptimizer , the result is really bad. I do not know what is different between them.

Theano:

 def gradient_updates_momentum_L2(cost, params, learning_rate, momentum, weight_cost_strength): # Make sure momentum is a sane value assert momentum < 1 and momentum >= 0 # List of update steps for each parameter updates = [] # Just gradient descent on cost for param in params: param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable) updates.append((param, param - learning_rate*(param_update + weight_cost_strength * param_update))) updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param))) return updates

TensorFlow:

 l2_loss = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables()]) cost = cost + WEIGHT_COST_STRENGTH * l2_loss train_op = tf.train.MomentumOptimizer(LEARNING_RATE, MOMENTUM).minimize(cost)

+6

tensorflow gradient-descent

Peter Yang Feb 18 '16 at 17:10

source share

1 answer

Rafał Józefowicz · Answer 1 · 2016-02-28T13:46:25+0000

If you look at the implementation of the pulse optimizer in TensorFlow [ link ], it will be implemented as follows:

 accum = accum * momentum() + grad; var -= accum * lr();

As you can see, the formulas are a little different. Scaling the moment of momentum according to the speed of training should solve your differences.

It is also very easy to implement such an optimizer yourself. The resulting code will look like the snippet in Theano that you included.

What is the difference in updating the pulse gradient in Tensorflow and Theano?

More articles: