Random Circumcision and Flip in Convolutional Neural Networks

In many scientific articles I've read about convolutional neural networks (CNNs), I see that people accidentally crop a square area (e.g. 224x224) from images, and then randomly flip it horizontally. Why is this accidental circumcision and coup? Also, why do people always crop a square area. Can CNN not work in rectangular areas?

+8
image-processing neural-network conv-neural-network
source share
2 answers

This is called data extension . By applying transformations to training data, you add synthetic data points. This provides the model with additional options without the cost of collecting and annotating more data. This can lead to reduced retraining and improved generalization of the model.

The intuition behind flipping is that the object must be equally recognizable as a mirror image. Note that horizontal flipping is a commonly used type of flipping. Vertical flipping does not always make sense, but it depends on the data.

The idea of cropping is to reduce the background contribution to the CNN solution. This is useful if you have tags to locate your property. This allows you to use the surrounding regions as negative examples and create a better detector. Random cropping can also act as a regularizer and base your classification on the presence of parts of the object, rather than focusing everything on a very clear function that might not always be present.

Why do people always crop a square area?

This is not a limitation of CNN. This may be a limitation of a particular implementation. Or by design, because accepting square input can lead to optimized implementation for speed. I would not read too much about this.

CNN with variable input size and fixed input:

This does not apply to cropping per square, but in general why the input is sometimes changed / cropped / worn out before entering into CNN:

Something to keep in mind is that designing a CNN involves deciding whether to support an input of variable size or not. Convolution, union, and nonlinearity operations will work for any input measurements. However, when you use CNN to solve image classification, you usually get fully connected layer (s), such as logistic regression or MLP. A fully related layer is how CNN creates a fixed-size output vector. A fixed size output may limit CNN to a fixed size input.

There are definitely workarounds that allow you to enter a variable size and still produce a fixed size. The simplest is to use a convolution layer to classify by regular patches in the image. This idea has been around for a while. The idea was to detect multiple occurrences of objects in the image and classify each case. The earliest example I can come up with is the work of the Yann LeCun group in the 1990s to simultaneously classify and localize numbers in a string . This is called turning CNNs with fully connected layers into a fully convolutional network. The most recent examples of fully convolutional networks are used to solve the semantic segmentation and classification of each pixel in an image. Here it is required to produce an output corresponding to the size of the input. Another solution is to use the global pool at the end of CNN to turn variable size feature maps into a fixed size output file. The size of the union window is set equal to the function map computed from the last conv. layer.

+15
source share

@ypx already gives a good answer about why data extension is required. I am going to share more information on why people use fixed-size square images as input.

Why is the fixed size of the input image?

If you have basic knowledge about convolutional neural networks, you will find out that it is good for convolutional layers of the pool and layers of nonlinearity that the input images are of variable size. But neural networks usually have fully connected layers as classifiers, the weight between the last conv layers and the first fully connected layer is fixed. If you specify an input image with a variable network size, there will be a problem because the size and weight of the card do not match. This is one reason to use a fixed input image size.

Another reason is that by fixing the size of the image, the training time of neural networks can be reduced. This is due to the fact that most (if not all) deep learning packages are written to process a batch of images in tensor format (usually in the form of (N, C, H, W), N is the packet size, C is the channel number, H and W - width and height of the image). If your input images do not have a fixed size, you cannot pack them in a package. Even if your network can handle variable sized input images, you still have to enter 1 image at a time. This is slower than batch processing.

Can a variable input image be used?

Yes, if you can create a fixed input size for fully connected layers, the size of the input image does not matter. A good choice is the adaptive pool, which will create cards with fixed output characteristics from cards with variable sizes. Right now, PyTorch provides two adaptive image combining layers, i.e. AdaptiveMaxPool2d and AdaptiveAvgPool2d . You can use layers to build a neural network that can receive input images of variable magnitude.

+1
source share

All Articles