Tensorflow: What does tf.nn.separable_conv2d do?

I'm not quite sure what exactly tf.nn.separable_conv2d does. It seems that pointwise_filter is a scaling factor for different functions when generating one pixel of the next layer. But I'm not sure if my interpretation is correct. Is there a link for this method and what is the use?

tf.nn.separable_conv2d generates the same form as tf.nn.conv2d. I would suggest that I can replace tf.nn.conv2d with tf.nn.separable_conv2d. But the result when using tf.nn.separable_conv2d seems very poor. The network stopped learning very early. For the MNIST dataset, accuracy is just a random guess of ~ 10%.

I thought that when I set pointwise_filter to 1.0 and make them non-synchronous, I get the same as tf.nn.conv2d. But actually ... another ~ 10% accuracy.

But when tf.nn.conv2d is used with the same hyperparameters, the accuracy can be 99%. Why?

In addition, this requires channel_multiplier * in_channels <out_channels. What for? What is the role of channel_multiplier here?

Thanks.

Edit:

I used channel_multiplier previously as 1.0. Maybe this is a bad choice. After I change it to 2.0, the accuracy becomes much better. But what is the role of channel_multiplier? Why is 1.0 not a good value?

+9
source share
2 answers

tf.nn.separable_conv2d() implements the so-called "separable convolution" described on slide 26 and later in this presentation .

The idea is that instead of jointly convolving across all image channels, you start a separate two-dimensional convolution for each channel with a channel_multiplier depth. Intermediate channels in_channels * channel_multiplier are combined together and matched to out_channels using a 1x1 convolution.

This is often an effective way to reduce the parametric complexity of early convolution in the Internet and can significantly speed up learning. channel_multiplier controls this complexity and is usually 4 to 8 for RGB input. To enter in grayscale use it makes little sense.

+24
source

To answer the last part of the question:

In addition, this requires channel_multiplier * in_channels <out_channels. What for?

I do not know why this restriction was originally introduced, but it was removed in the current main TF branch and should go to version 1.3. Perhaps thinking was something like the lines "If you reduce the decrease in the number of channels in the stream step, you could also choose a smaller channel multiplier and save it when calculating." I assume that this reasoning is erroneous because the stepwise step can combine the values ​​from different depth_filters or, perhaps, because it is possible to reduce the size a little, and not by the full coefficient.

0
source

All Articles