Short answer: do not use Adadelta h3>
Very few people use it today; instead, stick to:
tf.train.MomentumOptimizer with a momentum of 0.9 is very standard and works well. The disadvantage is that you need to find the best learning rate.tf.train.RMSPropOptimizer : Results are less dependent on good learning speed. This algorithm is very similar to Adadelta strong>, but works better in my opinion.
If you really want to use Adadelta, use the paper options: learning_rate=1., rho=0.95, epsilon=1e-6 . More epsilon will help from the start, but be prepared to wait a little longer than with other optimizers to see convergence.
Please note that in the document they do not even use the learning speed, which is equal to the fact that it is equal to 1 .
Long answer
Adadella has a very slow start. Full algorithm from:

The problem is that they accumulate a square of updates.
- In step 0, the average for these updates is zero, so the first update will be very small.
- As the first update is very small, the average update value will be very small at the beginning, which at the beginning is a vicious cycle.
I think Adadelta works better with large networks than yours, and after some iterations it should equal the performance of RMSProp or Adam.
Here is my code to play a little with the Adadelta optimizer:
import tensorflow as tf v = tf.Variable(10.) loss = v * v optimizer = tf.train.AdadeltaOptimizer(1., 0.95, 1e-6) train_op = optimizer.minimize(loss) accum = optimizer.get_slot(v, "accum")
The first 10 lines:
v accum accum_update 9.994 20.000 0.000001 9.988 38.975 0.000002 9.983 56.979 0.000003 9.978 74.061 0.000004 9.973 90.270 0.000005 9.968 105.648 0.000006 9.963 120.237 0.000006 9.958 134.077 0.000007 9.953 147.205 0.000008 9.948 159.658 0.000009
source share