No translation for tf.matmul in TensorFlow

I have a problem that I was struggling with. This is due to tf.matmul() and the lack of translation.

I am aware of a similar problem at https://github.com/tensorflow/tensorflow/issues/216 , but tf.batch_matmul() not like the solution for my case.

I need to encode my input as a 4D tensor: X = tf.placeholder(tf.float32, shape=(None, None, None, 100)) The first dimension is the lot size, the second is the number of entries in the lot. You can present each record as a composition of several objects (third dimension). Finally, each object is described by a vector of 100 float values.

Please note that I used None for the second and third dimensions, because the actual sizes may vary in each batch. However, for simplicity, let us form a tensor with real numbers: X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))

These are the steps of my calculation:

  • calculate the function of each vector from 100 values ​​of the float (for example, a linear function) W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1)) Y = tf.matmul(X, W) : there is no translation for tf.matmul() and tf.matmul() cannot be used tf.batch_matmul() expected form Y: (5, 10, 4, 50)

  • application of the average pool for each batch record (above the objects of each record): Y_avg = tf.reduce_mean(Y, 2) expected form Y_avg: (5, 10, 50)

I expected tf.matmul() to support broadcast. Then I found tf.batch_matmul() , but still it looks like it doesn’t apply to my case (for example, W must have at least 3 dimensions, it is not clear why).

By the way, above I used a simple linear function (whose scales are stored in W). But in my model, I have a deep network. So, the more general problem that I have is automatically calculating the function for each slice of the tensor. This is why I expected tf.matmul() to have a broadcast mode (if so, tf.batch_matmul() might not even be needed).

Look forward to learning from you! Alessio

+7
tensorflow broadcasting
source share
1 answer

This can be done by changing the form X to the form [n, d] , where d is the dimension of one "instance" of the calculation (100 in your example), and n is the number of instances in your multidimensional object ( 5*10*4=200 in your example). After the rebuild, you can use tf.matmul and then return to the required form again. The fact that the first three dimensions are subject to change makes this a bit more complicated, but you can use tf.shape to determine the actual shapes at runtime. Finally, you can perform the second step of the calculation, which should be a simple tf.reduce_mean on the corresponding dimension. In general, it will look like this:

 X = tf.placeholder(tf.float32, shape=(None, None, None, 100)) W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1)) X_ = tf.reshape(X, [-1, 100]) Y_ = tf.matmul(X_, W) X_shape = tf.gather(tf.shape(X), [0,1,2]) # Extract the first three dimensions target_shape = tf.concat(0, [X_shape, [50]]) Y = tf.reshape(Y_, target_shape) Y_avg = tf.reduce_mean(Y, 2) 
+5
source share

All Articles