Why do convolutional neural network inputs always have a square image?

I have been doing deep learning with CNN for a while, and I understand that model inputs are always squares of images.

I see that the convolution itself or the architecture of the neural network itself does not require this property.

So what is the reason for this?

Please rate any comments!

+5
source share
3 answers

Because square images are pleasing to the eye. But there are applications on non-square images when it requires a domain. For example, the original SVHN dataset is an image of several digits, and therefore, rectangular images are used as input for the convoys, and here

+1
source

No need to have square images. I see two reasons:

  • scaling: if images are scaled automatically from a different aspect ratio (and landscape / portrait mode), this may introduce the smallest error on average.
  • publishing / visualization: square images are easy to display together
+1
source

From Suhas Pillai:

The problem is not convolutional layers, but fully connected network layers that require a fixed number of neurons. For example, take a small layer of 3 layers + softmax. If the first 2 layers are convolutional + maximum pool, assuming the sizes are the same before and after the convolution, and combining reduces dim / 2, which is usually the case. For a 3 * 32 * 32 image (C, W, H) with 4 filters in the first layer and 6 filters in the second layer, the output after convolution + maximum merging at the end of the second layer will be 6 * 8 * 8, whereas for an image with 3 * 64 * 64, at the end of the output of the 2nd level there will be 6 * 16 * 16. Before we fully connect, we stretch this as a single vector (6 * 8 * 8 = 384 neurons) and perform a completely connected operation. Thus, you cannot have different layers, completely connected with each other, for the image of different sizes. One way to solve this problem is to use the spatial union pyramid, where you force the output of the last convolutional layer to combine it with a fixed number of cells (Ie neurons) so that a fully connected layer has the same number of neurons. You can also check out fully convolutional networks that can accept non-square images.

+1
source

All Articles