Why do convolutional neural network inputs always have a square image?

Question

Why do convolutional neural network inputs always have a square image?

I have been doing deep learning with CNN for a while, and I understand that model inputs are always squares of images.

I see that the convolution itself or the architecture of the neural network itself does not require this property.

So what is the reason for this?

Please rate any comments!

+5

artificial-intelligence deep-learning neural-network

T nguyen Aug 16 '16 at 10:11

source share

3 answers

Yaroslav bulatov · Answer 1 · 2016-08-16T14:52:53+0000

Because square images are pleasing to the eye. But there are applications on non-square images when it requires a domain. For example, the original SVHN dataset is an image of several digits, and therefore, rectangular images are used as input for the convoys, and here

Martin thoma · Answer 2 · 2016-08-24T10:24:17+0000

No need to have square images. I see two reasons:

scaling: if images are scaled automatically from a different aspect ratio (and landscape / portrait mode), this may introduce the smallest error on average.
publishing / visualization: square images are easy to display together

T nguyen · Answer 3 · 2016-08-26T01:38:26+0000

From Suhas Pillai:

The problem is not convolutional layers, but fully connected network layers that require a fixed number of neurons. For example, take a small layer of 3 layers + softmax. If the first 2 layers are convolutional + maximum pool, assuming the sizes are the same before and after the convolution, and combining reduces dim / 2, which is usually the case. For a 3 * 32 * 32 image (C, W, H) with 4 filters in the first layer and 6 filters in the second layer, the output after convolution + maximum merging at the end of the second layer will be 6 * 8 * 8, whereas for an image with 3 * 64 * 64, at the end of the output of the 2nd level there will be 6 * 16 * 16. Before we fully connect, we stretch this as a single vector (6 * 8 * 8 = 384 neurons) and perform a completely connected operation. Thus, you cannot have different layers, completely connected with each other, for the image of different sizes. One way to solve this problem is to use the spatial union pyramid, where you force the output of the last convolutional layer to combine it with a fixed number of cells (Ie neurons) so that a fully connected layer has the same number of neurons. You can also check out fully convolutional networks that can accept non-square images.

Why do convolutional neural network inputs always have a square image?

More articles: