I am training a model where the input vector is the output of another model. This includes restoring the first model from the checkpoint file when initializing the second model from scratch (using tf.initialize_variables() ) in the same process.
There is a significant amount of code and abstraction, so I just paste the relevant sections here.
The following is the recovery code:
self.variables = [var for var in all_vars if var.name.startswith(self.name)] saver = tf.train.Saver(self.variables, max_to_keep=3) self.save_path = tf.train.latest_checkpoint(os.path.dirname(self.checkpoint_path)) if should_restore: self.saver.restore(self.sess, save_path) else: self.sess.run(tf.initialize_variables(self.variables))
Each model is limited by its own schedule and session, for example:
self.graph = tf.Graph() self.sess = tf.Session(graph=self.graph) with self.sess.graph.as_default(): # Create variables and ops.
All variables in each model are created in the variable_scope context manager.
Feeding works as follows:
- The background thread calls
sess.run(inference_op) on input = scipy.misc.imread(X) and puts the result in a thread-blocking queue. - The main training cycle is read from the queue and calls
sess.run(train_op) on the second model.
Problem:
I observe that the loss values, even at the very first iteration of training (second model), are constantly changing across the tracks (and become nano in several iterations). I confirmed that the output of the first model is exactly the same every time. Commenting on sess.run first model and replacing it with identical input from the pickled file, this behavior is not displayed.
This is train_op :
loss_op = tf.nn.sparse_softmax_cross_entropy(network.feedforward())
I know this is vague, but I am pleased to provide more detailed information. Any help is appreciated!
source share