Explained in Theano tutorial

Question

Explained in Theano tutorial

I read this tutorial on the home page

I have doubts about the for loop .

If you initialize the variable ' param_update ' to zero.

param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)

and then you update its value in the remaining two lines.

 updates.append((param, param - learning_rate*param_update)) updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))

Why do we need this?

I guess something is wrong here. Can you guys help me!

+7

python numpy theano deep-learning gradient-descent

Abhishek Aug 18 '14 at 15:27

source share

1 answer

Kyunghyun Cho · Accepted Answer · 2014-08-18T15:46:25+0000

Initializing param_update using theano.shared(.) Only tells Theano to reserve a variable that will be used by Theano functions. This initialization code is called only once and will not be used later in reset param_update to 0.

The actual param_update will be updated according to the last line

 updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))

when the train function was built using this update dictionary as an argument ([23] in the tutorial):

 train = theano.function([mlp_input, mlp_target], cost, updates=gradient_updates_momentum(cost, mlp.params, learning_rate, momentum))

Each time a train is called, Theano calculates the cost wrt param gradient and update param_update to a new update direction according to the impulse rule. Then param will be updated following the update direction stored in param_update with the corresponding learning_rate .

Explained in Theano tutorial

More articles: