I looked at a sample code for handling gradients that has TensorFlow:
however, I noticed that the apply_gradients function was obtained from GradientDescentOptimizer . Does this mean that using the sample code above, you can only apply gradient descent rules (note that we could change opt = GradientDescentOptimizer or Adam or any other optimizer)? In particular, what does apply_gradients do? I am finally checking the code on the tf github page, but it was a bunch of python that had nothing to do with mathematical expressions, so it was hard to say what it does and how it changed from optimizer to optimizer.
For example, if I wanted to implement my own custom optimizer that could use gradients (or maybe not just change the weight directly with some rule, maybe a more biologically plausible rule), is this impossible with the above code example?
In particular, I wanted to implement a version of gradient descent, which is artificially limited in a compact domain. In particular, I wanted to implement the following equation:
w := (w - mu*grad + eps) mod B
at TensorFlow. I realized that the following is true:
w := w mod B - mu*grad mod B + eps mod B
so I thought I could just implement it by doing:
def Process_grads(g,mu_noise,stddev_noise,B): return (g+tf.random_normal(tf.shape(g),mean=mu_noise,stddev=stddev_noise) ) % B
and then just:
processed_grads_and_vars = [(Process_grads(gv[0]), gv[1]) for gv in grads_and_vars]
however, I realized that this was not enough, because I actually do not have access to w , so I can not implement:
w mod B
at least not the way I tried. Is there any way to do this? those. actually change the update rule? At least how I tried?
I know my hacker rule for updating, but my task is to change the update equation rather than actually care about this update rule (so don't be obsessed with it if it's a little weird).
I came up with a super hacker solution:
def manual_update_GDL(arg,learning_rate,g,mu_noise,stddev_noise): with tf.variable_scope(arg.mdl_scope_name,reuse=True): W_var = tf.get_variable(name='W') eps = tf.random_normal(tf.shape(g),mean=mu_noise,stddev=stddev_noise)
not sure if it works, but something like this should work as a whole. The idea is to simply write down the equation you need to use ( in TensorFlow ) for the training speed, and then manually update the scales using a session.
Unfortunately, such a solution means that we have to take care of annealing (the disruptive speed of manual training, which seems annoying). This solution probably has many other problems, feel free to specify them (and give solutions if you can).
For this very simple problem, I realized that you can just make a normal optimizer rule, and then just take the mode of the weights and reassign them to their value:
sess.run(fetches=train_step) if arg.compact:
but in this case it is a coincidence that such a simple solution exists (unfortunately, bypasses the whole point of my question).
In fact, these solutions significantly slow down the code. At the moment, this is the best I have.
As a link, I saw this question: How to create an optimizer in Tensorflow , but did not find the answer directly to my question.