I am implementing models of the probabilistic factorization matrix in anano and would like to use the Adam gradient descent .
My goal is to have code that is as unbroken as possible, which means I donβt want to explicitly track the amounts of βmβ and βvβ from Adam's algorithm.
It would seem that this is possible, especially after seeing how Lasagna Adam is realized : he hides the values ββof "m" and "v" inside the theano.function update rules .
This works when a negative logarithmic likelihood is formed with each term that processes a different quantity. But in probabilistic matrix factorization, each term contains a point product of one hidden user vector and one vector of the hidden element. Thus, if I make an instance of Lasagna Adam on each member, I will have several values ββof "m" and "v" for the same latent vector, and not how Adam should work.
I am also posted on the Lasagne group , actually twice , where there are a few details and some examples.
I thought of two possible implementations:
- each existing rating (which means each existing term in the global objective function of the NLL) has its own Adam, updated by a special call to the anano.function function. Unfortunately, this leads to the incorrect use of Adam, since the same latent vector will be associated with different values ββof "m" and "v" used by the Adam algorithm, and this is not the way Adam should work.
- Adam's call over the entire objective NLL, which will make the update mechanism, like a simple Gradient Descent, instead of SGD, with all known shortcomings (high computation time, staying at local minima, etc.).
My questions:
maybe there is something that I did not understand correctly how Lazan Adam works?
Will option number 2 really look like SGD, in the sense that every update of the latent vector will affect another update (in the same Adam call) that uses this updated vector?
Do you have any other suggestions on how to implement it?
Any idea on how to solve this problem and avoid manually storing replicated vectors and matrices for the 'v' and 'm' values?
fstab source share