Performance and memory TensorFlow map_fn

Question

Performance and memory TensorFlow map_fn

I have two implementations of a function that computes the Frobenius norm of a deductible highway. This function is applied to all vectors of dimension 3 of the 4D-tensor x. Then all the results are summarized. I use this as part of the convoys. TensorFlow version is 0.9.

My first implementation uses the tf.batch_ * functions.

def test1(x): """x: [batch, height, width, channels]""" s = x.get_shape().as_list() a = tf.reshape(x, [-1, s[3], 1]) c = tf.batch_matmul(a, a, adj_y=True) c2 = tf.square(c) diag = tf.batch_matrix_diag_part(c2) return tf.reduce_sum(c2) - tf.reduce_sum(diag)

This works, but the intermediate tensor c is the channels times larger than the tensor x, which limits the lot size. So I tried using a map_fn based approach:

 def fn(x): x1 = tf.reshape(x, [-1, 1]) c1 = tf.matmul(x1, x1, transpose_b=True) c2 = tf.square(c1) t1 = tf.trace(c2) return tf.reduce_sum(c2)- t1) def test2(x): """x: [batch, height, width, channels]""" s = x.get_shape().as_list() a = tf.reshape(x, [-1, s[3]]) return tf.reduce_sum(tf.map_fn(fn, a))

When I run the second function, I get a lot of messages (50+), for example:

 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 16084 get requests, put_count=20101 evicted_count=4000 eviction_rate=0.198995 and unsatisfied allocation rate=0

The execution time of test2 is approximately 45 times longer than the execution time of test1.

When using parallel_itrations = 10 memory usage for map_fn should be OK * 10 channels * channel, which is much lower than test1.

So now the question arises: why map_fn approach takes more time, and why it seems that it uses more memory, not less?

+5

tensorflow

Klh Aug 1 '16 at 11:08

source share

No one has answered this question yet.

See similar questions:

4

Is there a way to use tenorflow map_fn on the GPU?

4