Tensorflow: How to Set Journal-Level Learning Speed ​​and Some Tensorflow Questions

I am starting deep training and starting Tensorflow, and I am trying to implement the algorithm in this document using Tensorflow. This document uses Matconvnet + Matlab to implement it, and I'm curious if Tensorflow has equivalent functions to achieve the same. The document said:

Network parameters were initialized using the Xavier method [14]. We used regression losses on four wavelet subbands with a penalty of l2, and the proposed network was trained using stochastic gradient descent (SGD). The regularization parameter (Ξ») was 0.0001, and the momentum was 0.9. The learning rate was set from 10-1 to 10-4, which was reduced on a magazine scale in each era.

This article uses the wavelet transform (WT) and the residual learning method (where the residual image = WT (HR) - WT (HR ') and HR' are used for training). The Xavier method proposes to initialize the normal distribution of variables using

stddev=sqrt(2/(filter_size*filter_size*num_filters) 

Q1. How to initialize variables? Is the code correct?

 weights = tf.Variable(tf.random_normal[img_size, img_size, 1, num_filters], stddev=stddev) 

This article does not explain how to describe the loss function in detail. I cannot find the equivalent Tensorflow function to set the learning speed in the log scale ( exponential_decay only). I understand that MomentumOptimizer equivalent to stochastic gradient descent with impulse.

Q2: Is it possible to set the learning speed in the log scale?

Q3: How to create the loss function described above?

I wrote this website to write the code below. Suppose the model () function returns the network specified in this article and lamda = 0.0001,

 inputs = tf.placeholder(tf.float32, shape=[None, patch_size, patch_size, num_channels]) labels = tf.placeholder(tf.float32, [None, patch_size, patch_size, num_channels]) # get the model output and weights for each conv pred, weights = model() # define loss function loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels, logits=pred) for weight in weights: regularizers += tf.nn.l2_loss(weight) loss = tf.reduce_mean(loss + 0.0001 * regularizers) learning_rate = tf.train.exponential_decay(???) # Not sure if we can have custom learning rate for log scale optimizer = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(loss, global_step) 

NOTE. As I am a Tensorflow deep teacher / novice, I copy-paste the code here and there, so please feel free to fix it if you can;)

+7
python deep-learning tensorflow
source share
3 answers

Other answers are very detailed and helpful. Here is an example of code that uses replacement material to reduce log-wide learning speed. NTN.

 import tensorflow as tf import numpy as np # data simulation N = 10000 D = 10 x = np.random.rand(N, D) w = np.random.rand(D,1) y = np.dot(x, w) print y.shape #modeling batch_size = 100 tni = tf.truncated_normal_initializer() X = tf.placeholder(tf.float32, [batch_size, D]) Y = tf.placeholder(tf.float32, [batch_size,1]) W = tf.get_variable("w", shape=[D,1], initializer=tni) B = tf.zeros([1]) lr = tf.placeholder(tf.float32) pred = tf.add(tf.matmul(X,W), B) print pred.shape mse = tf.reduce_sum(tf.losses.mean_squared_error(Y, pred)) opt = tf.train.MomentumOptimizer(lr, 0.9) train_op = opt.minimize(mse) learning_rate = 0.0001 do_train = True acc_err = 0.0 sess = tf.Session() sess.run(tf.global_variables_initializer()) while do_train: for i in range (100000): if i > 0 and i % N == 0: # epoch done, decrease learning rate by 2 learning_rate /= 2 print "Epoch completed. LR =", learning_rate idx = i/batch_size + i%batch_size f = {X:x[idx:idx+batch_size,:], Y:y[idx:idx+batch_size,:], lr: learning_rate} _, err = sess.run([train_op, mse], feed_dict = f) acc_err += err if i%5000 == 0: print "Average error = {}".format(acc_err/5000) acc_err = 0.0 
+2
source share

Q1. How to initialize variables? Is the code correct?

This is correct (although there are no opening parentheses). You can also see tf.get_variable if the variables will be reused.

Q2: Is it possible to set the learning speed in the log scale?

Exponential decay reduces the speed of learning at every step. I think you want tf.train.piecewise_constant and set boundaries in every era.

EDIT: look at another answer, use the argument staircase=True !

Q3: How to create the loss function described above?

The loss function looks right.

+4
source share

Q1. How to initialize variables? Is the code correct?

Use tf.get_variable or switch to slim (it will automatically initialize you). example

Q2: Is it possible to set the learning speed in the log scale?

You can, but do you need it? This is not the first thing you need to decide on this network. Please check # 3

However, for reference only, use the following notation.

learning_rate_node = tf.train.exponential_decay (learning_rate = 0.001, decay_steps = 10000, decay_rate = 0.98, ladder = True)

optimizer = tf.train.AdamOptimizer (learning_rate = learning_rate_node) .minimize (loss)

Q3: How to create the loss function described above?

At first you did not write β€œpred” for the β€œimage” of conversion to this message (on the basis of paper you need to apply subtraction and IDWT to get the final image).

There is one problem here: logits should be calculated based on the data of your labels. those. if you will use marked data as "Y: Label", you need to write

pred = model ()

pred = tf.matmul (pred, weight) + offsets

logits = tf.nn.softmax (pred)

loss = tf.reduce_mean (tf.abs (logits - labels))

This will give you the output Y: the shortcut to be used

If your dataset, labeled as an image, is labeled, then you need to follow this:

pred = model ()

pred = tf.matmul (image, weight) + offsets

logits = tf.nn.softmax (pred)

image = apply_IDWT ("X: input", logs) # this will apply IDWT (x_label - y_label)

loss = tf.reduce_mean (tf.abs (image labels))

Logs are the output of your network. You will use this as a result to calculate the rest. Instead of matmul, you can add a conv2d layer here without normalizing the batch and activation function and set the number of output functions to 4. Example:

pred = model ()

pred = slim.conv2d (pred, 4, [3, 3], activation_fn = None, padding = 'SAME', scope = 'output')

logits = tf.nn.softmax (pred)

image = apply_IDWT ("X: input", logs) # this will apply IDWT (x_label - y_label)

loss = tf.reduce_mean (tf.abs (logits - labels))

This loss function will give you basic training opportunities. However, this distance is L1, and a number of problems ( check ) may occur. Think about the following situation

Say you have the following array as an output [10, 10, 10, 0, 0], and you are trying to reach [10, 10, 10, 10, 10]. In this case, your loss is 20 (10 + 10). However, you have 3/5 success. In addition, this may indicate some outfit.

In this case, we consider the following result [6, 6, 6, 6, 6]. He still has a loss of 20 (4 + 4 + 4 + 4 + 4). However, when you apply threshold 5, you can succeed 5/5. Therefore, this is the case that we want.

If you use L2 loss, for the first case you will have 10 ^ 2 + 10 ^ 2 = 200 as the loss output. In the second case, you get 4 ^ 2 * 5 = 80. Therefore, the optimizer will try to run away from # 1 as quickly as possible in order to achieve global success, and not the complete success of some results and the complete failure of others. You can use the loss function for this.

tf.reduce_mean (tf.nn.l2_loss (logits - image))

Alternatively, you can check the cross-entropy loss function. (he applies softmax internally, do not use softmax twice)

tf.reduce_mean (tf.nn.softmax_cross_entropy_with_logits (pred, image))

+4
source share

All Articles