How to transfer data to multiple GPUs using the Densetet Tensorflow API

We perform tasks on several Tensorflow GPUs and evaluate the transition from a queue-based model (using the string_input_producer interface) to the new Tensorflow Dataset API. The latter seems to offer an easier way to switch between train and validation at the same time.

Below is a code snippet that shows how we do this.

train_dataset, train_iterator = get_dataset(train_files, batch_size, epochs) val_dataset, val_iterator = get_dataset(val_files, batch_size, epochs) is_validating = tf.placeholder(dtype=bool, shape=()) next_batch = tf.cond(is_validating, lambda: val_iterator.get_next(), lambda: train_iterator.get_next()) validation_tower = self.num_gpus - 1 tower_grads = [] for i in range(self.num_gpus): with tf.variable_scope(tf.get_variable_scope(),reuse=(i > 0)): with tf.device('/gpu:%d' % i), tf.name_scope('%s_%d' % ('gpu_', i)) as scope: if i == validation_tower: images, labels = next_batch # Loss funcs snipped out else: images, labels = next_batch # Loss funcs snipped out 

The get_dataset function creates a data set, sets the map function and batch size. It also creates an iterator, but does not initialize it. Iterator initialization occurs before the session.

During the session, the is_validating boolean request is executed, and every few steps we go through are credentials like True using feed_dict to use the validation dataset

I have a question:

Suppose I have 8 gpus, so we run the training on 7 GPUs. Does the iterator move from one point for each of these 7 GPUs, and therefore provides all 7 GPUs with the same data?

+7
tensorflow tensorflow-gpu tensorflow-datasets
source share
1 answer

Currently, there are three main options that have different trade-offs in usage and performance:

  • In the Dataset.batch() conversion, create one large batch containing examples for all of your GPUs. Then use tf.split(..., self.num_gpus) on the output of Iterator.get_next() to create subpackages for each GPU. This is perhaps the easiest approach, but it does splitting in a critical way.

  • In the Dataset.batch() conversion, Dataset.batch() , the size of which depends on one GPU. Then call Iterator.get_next() once per GPU to get several different batches. (By contrast, in your current code, the same next_batch value next_batch sent to each GPU, which is probably not what you would like to do.)

  • Create multiple iterators, one per GPU. A piece of data using Dataset.shard() at the beginning of the pipeline (for example, in the file list if your data set is plastered). Please note that this approach will consume more resources on the host, so you may need to type all buffer sizes and / or degrees of parallelism

Please note that the current tf.data pipelines tf.data work on the processor, and an important aspect of an efficient pipeline is preparing your training input for the GPU while the previous step is still in progress. See TensorFlow CNN tests , for example, code that shows how to efficiently transfer data to GPUs. We are currently working on adding this support to the tf.data API.

+7
source share

All Articles