We perform tasks on several Tensorflow GPUs and evaluate the transition from a queue-based model (using the string_input_producer interface) to the new Tensorflow Dataset API. The latter seems to offer an easier way to switch between train and validation at the same time.
Below is a code snippet that shows how we do this.
train_dataset, train_iterator = get_dataset(train_files, batch_size, epochs) val_dataset, val_iterator = get_dataset(val_files, batch_size, epochs) is_validating = tf.placeholder(dtype=bool, shape=()) next_batch = tf.cond(is_validating, lambda: val_iterator.get_next(), lambda: train_iterator.get_next()) validation_tower = self.num_gpus - 1 tower_grads = [] for i in range(self.num_gpus): with tf.variable_scope(tf.get_variable_scope(),reuse=(i > 0)): with tf.device('/gpu:%d' % i), tf.name_scope('%s_%d' % ('gpu_', i)) as scope: if i == validation_tower: images, labels = next_batch
The get_dataset function creates a data set, sets the map function and batch size. It also creates an iterator, but does not initialize it. Iterator initialization occurs before the session.
During the session, the is_validating boolean request is executed, and every few steps we go through are credentials like True using feed_dict to use the validation dataset
I have a question:
Suppose I have 8 gpus, so we run the training on 7 GPUs. Does the iterator move from one point for each of these 7 GPUs, and therefore provides all 7 GPUs with the same data?
tensorflow tensorflow-gpu tensorflow-datasets
wrecktangle
source share