How does distributed tensor work? (Problem with tf.train.Server)

I have some problems with a new tensor flow feature that allows us to run a distributed tensor.

I just want to run 2 tf.constant with two tasks, but my code never ends. it looks like this:

import tensorflow as tf cluster = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]}) server = tf.train.Server(cluster, job_name="local", task_index=0) with tf.Session(server.target) as sess: with tf.device("/job:local/replica:0/task:0"): const1 = tf.constant("Hello I am the first constant") with tf.device("/job:local/replica:0/task:1"): const2 = tf.constant("Hello I am the second constant") print sess.run([const1, const2]) 

And I have the following code that works (with only one localhost: 2222):

 import tensorflow as tf cluster = tf.train.ClusterSpec({"local": ["localhost:2222"]}) server = tf.train.Server(cluster, job_name="local", task_index=0) with tf.Session(server.target) as sess: with tf.device("/job:local/replica:0/task:0"): const1 = tf.constant("Hello I am the first constant") const2 = tf.constant("Hello I am the second constant") print sess.run([const1, const2]) out : ['Hello I am the first constant', 'Hello I am the second constant'] 

Perhaps I do not understand the function ... Therefore, if you have an idea, let me know, please.

Thanks;).

EDIT

Well, I found that with an entry on ipython it cannot be started, like me. I need to have a python program and execute it using a terminal. But now I have a new problem, when I run my code, now the server is trying to connect to the specified two ports, while I tell it to work on only one. My new code is as follows:

 import tensorflow as tf tf.app.flags.DEFINE_string('job_name', '', 'One of local worker') tf.app.flags.DEFINE_string('local', '', """Comma-separated list of hostname:port for the """) tf.app.flags.DEFINE_integer('task_id', 0, 'Task ID of local/replica running the training') tf.app.flags.DEFINE_integer('constant_id', 0, 'the constant we want to run') FLAGS = tf.app.flags.FLAGS local_host = FLAGS.local.split(',') cluster = tf.train.ClusterSpec({"local": local_host}) server = tf.train.Server(cluster, job_name=FLAGS.job_name, task_index=FLAGS.task_id) with tf.Session(server.target) as sess: if(FLAGS.constant_id == 0): with tf.device('/job:local/task:'+str(FLAGS.task_id)): const1 = tf.constant("Hello I am the first constant") print sess.run(const1) if (FLAGS.constant_id == 1): with tf.device('/job:local/task:'+str(FLAGS.task_id)): const2 = tf.constant("Hello I am the second constant") print sess.run(const2) 

I run the following command line

 python test_distributed_tensorflow.py --local=localhost:3000,localhost:3001 --job_name=local --task_id=0 --constant_id=0 

and I get the following magazines

 I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: YI tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970M, pci bus id: 0000:01:00.0) I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPortsGrpcChannelCache for job local -> {localhost:3000, localhost:3001} I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:3000 E0518 15:27:11.794873779 10884 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:3001': socket error: connection refused E0518 15:27:12.795184395 10884 tcp_client_posix.c:173] failed to connect to 'ipv4:127.0.0.1:3001': socket error: connection refused ... 

EDIT 2

I have found a solution. it is just necessary to complete all the tasks that we provide to the Server. So I have to run this:

 python test_distributed_tensorflow.py --local=localhost:2345,localhost:2346 --job_name=local --task_id=0 --constant_id=0 \ & \ python test_distributed_tensorflow.py --local=localhost:2345,localhost:2346 --job_name=local --task_id=1 --constant_id=1 

I hope this can help someone;)

+7
python cluster-computing server tensorflow distributed
source share

No one has answered this question yet.

See related questions:

9540
What does the yield keyword do?
5504
Does Python have a ternary conditional operator?
5433
What if __name__ == "__main__": do?
5116
How to check if a file exists without exceptions?
4268
How to combine two dictionaries in one expression?
3602
Does Python have a "contains" substring method?
3474
How to list all the catalog files?
3428
How to sort a dictionary by value?
3235
How to check if a list is empty?
0
Appointment of a device in a distributed tensor flow

All Articles