Loss functions in tensor flow (with if-else)

I am trying to use various loss functions in a tensor flow.

The loss function I want is an epsilon insensitive function (this is componentwise):

if(|yData-yModel|<epsilon): loss=0 else loss=|yData-yModel| 

I tried this solution:

 yData=tf.placeholder("float",[None,numberOutputs]) yModel=model(... epsilon=0.2 epsilonTensor=epsilon*tf.ones_like(yData) loss=tf.maximum(tf.abs(yData-yModel)-epsilonTensor,tf.zeros_like(yData)) optimizer = tf.train.GradientDescentOptimizer(0.25) train = optimizer.minimize(loss) 

I also used

 optimizer = tf.train.MomentumOptimizer(0.001,0.9) 

I see no errors in implementation. However, it does not converge, and loss = tf.square (yData-yModel) converges, and loss = tf.maximum (tf.square (yData-yModel) -epsilonTensor, tf.zeros_like (yData)) also converges.

So, I also tried something simpler loss = tf.abs (yData-yModel), and it also doesn't converge. Am I mistaken or have problems with the non-differentiability of abs to zero or something else? What happens to the abs function?

+7
tensorflow
source share
1 answer

When your loss is something like Loss(x)=abs(xy) , then the solution is an unstable fixed point SGD - start minimizing from a point arbitrarily close to the solution, and the next step will increase the loss.

Having a stable fixed point is a requirement for convergence of an iterative procedure such as SGD. In practice, this means that your optimization will move towards a local minimum, but, having walked close enough, it jumps to the solution in increments proportional to the learning speed. Here is a toy TensorFlow program that illustrates the problem

 x = tf.Variable(0.) loss_op = tf.abs(x-1.05) opt = tf.train.GradientDescentOptimizer(0.1) train_op = opt.minimize(loss_op) sess = tf.InteractiveSession() sess.run(tf.initialize_all_variables()) xvals = [] for i in range(20): unused, loss, xval = sess.run([train_op, loss_op, x]) xvals.append(xval) pyplot.plot(xvals) 

Grade Graph x

Some solutions to the problem:

  • Use a more robust solver such as the Proximal Gradient method
  • Use a more friendly SGD loss feature like Huber Loss
  • Use your training schedule to gradually reduce your learning speed.

Here's a way to implement (3) on the toy problem above

 x = tf.Variable(0.) loss_op = tf.abs(x-1.05) step = tf.Variable(0) learning_rate = tf.train.exponential_decay( 0.2, # Base learning rate. step, # Current index into the dataset. 1, # Decay step. 0.9 # Decay rate ) opt = tf.train.GradientDescentOptimizer(learning_rate) train_op = opt.minimize(loss_op, global_step=step) sess = tf.InteractiveSession() sess.run(tf.initialize_all_variables()) xvals = [] for i in range(40): unused, loss, xval = sess.run([train_op, loss_op, x]) xvals.append(xval) pyplot.plot(xvals) 

enter image description here

+12
source share

All Articles