In distributed TensorFlow, is it possible to share the same queue between different workers?

In TensorFlow, I want the file name queue to be shared between different workers on different machines, so that each machine can get a subset of the files for training. I searched a lot and it seems that only variables can be placed in the PS task for sharing. Does anyone have an example? Thanks.

+8
tensorflow
source share
1 answer

You can split the same queue between workers by setting the optional shared_name argument when creating the queue. As with tf.Variable objects, you can place a queue on any device that can be accessed from different workers. For example:

 with tf.device("/job:ps/task:0"): # Place queue on parameter server. q = tf.FIFOQueue(..., shared_name="shared_queue") 

A few notes:

  • The value for shared_name must be unique for the particular queue you are sharing. Unfortunately, the Python API does not currently use scaling or automatic unification of names to make this easier, so you will have to verify this manually.

  • You do not need to place a queue on the parameter server. One possible configuration would be to set up an additional “job input” (for example, "/job:input" ) containing a set of tasks that perform preprocessing, and export a common queue for workers.

+16
source share

All Articles