Activation function after a merge or convolutional layer?

The theory from these links shows that the convolution network order is: Convolutional Layer - Non-linear Activation - Pooling Layer .

  1. Neural networks and deep learning (equation (125)
  2. Deep Learning Book (p. 304, 1st paragraph)
  3. Lenet (equation)
  4. Source in this title

But in the latest implementation from these sites, it was said that the order: Convolutional Layer - Pooling Layer - Non-linear Activation

  1. network3.py
  2. Source Code, LeNetConvPoolLayer Class

I also tried to learn the syntax of the Conv2D operation, but there is no activation function, this is just a convolution with the kernel upside down. Can someone help me explain why this is happening?

+19
theano neural-network convolution
source share
2 answers

Well, the maximum pool and monotonically growing nonlinearities commute. This means that MaxPool (Relu (x)) = Relu (MaxPool (x)) for any input. So the result in this case is the same. Thus, it is technically better to first perform a sub-selection using max-pooling, and then apply non-linearity (if it is expensive, for example, a sigmoid). In practice, this is often done the other way around - it seems that performance does not change much.

As for conv2D, it does not flip the core. It implements the exact definition of convolution. This is a linear operation, so you yourself must add non-linearity in the next step, for example, theano.tensor.nnet.relu .

+25
source share

In many documents, people use conv -> pooling -> non-linearity . This does not mean that you cannot use a different order and get reasonable results. In the case of the maximum union layer and ReLU, the order does not matter (both calculate the same thing):

enter image description here

You can prove that this is true if you recall that ReLU is an elementary operation and a non-decreasing function, therefore

enter image description here

The same thing happens for almost every activation function (most of them do not decrease). But it does not work for the common pool layer (middle pool).


Nevertheless, both orders give the same result, Activation(MaxPool(x)) does it much faster, performing fewer operations. For a union layer of size k it uses k^2 times less than the activation function calls.

Unfortunately, this optimization for CNN is negligible, since most of the time is used in convolutional layers.

+13
source share

All Articles