Activation function after a merge or convolutional layer?

Question

Activation function after a merge or convolutional layer?

The theory from these links shows that the convolution network order is: Convolutional Layer - Non-linear Activation - Pooling Layer .

But in the latest implementation from these sites, it was said that the order: Convolutional Layer - Pooling Layer - Non-linear Activation

I also tried to learn the syntax of the Conv2D operation, but there is no activation function, this is just a convolution with the kernel upside down. Can someone help me explain why this is happening?

+19

theano neural-network convolution

malioboro Feb 21 '16 at 23:20

source share

2 answers

In many documents, people use conv -> pooling -> non-linearity . This does not mean that you cannot use a different order and get reasonable results. In the case of the maximum union layer and ReLU, the order does not matter (both calculate the same thing):

You can prove that this is true if you recall that ReLU is an elementary operation and a non-decreasing function, therefore

The same thing happens for almost every activation function (most of them do not decrease). But it does not work for the common pool layer (middle pool).

Nevertheless, both orders give the same result, Activation(MaxPool(x)) does it much faster, performing fewer operations. For a union layer of size k it uses k^2 times less than the activation function calls.

Unfortunately, this optimization for CNN is negligible, since most of the time is used in convolutional layers.

+13

Salvador dali May 29, '17 at 21:12

source share

eickenberg · Accepted Answer · 2016-02-22T09:48:59+0000

Well, the maximum pool and monotonically growing nonlinearities commute. This means that MaxPool (Relu (x)) = Relu (MaxPool (x)) for any input. So the result in this case is the same. Thus, it is technically better to first perform a sub-selection using max-pooling, and then apply non-linearity (if it is expensive, for example, a sigmoid). In practice, this is often done the other way around - it seems that performance does not change much.

As for conv2D, it does not flip the core. It implements the exact definition of convolution. This is a linear operation, so you yourself must add non-linearity in the next step, for example, theano.tensor.nnet.relu .

Activation function after a merge or convolutional layer?

More articles: