Why do we usually have more than one fully connected layer in the late stages of CNN?

As I noticed, in many popular convolutional neural network architectures (such as AlexNet), people use more than one fully connected layer with almost the same size to collect responses to previously discovered functions at early levels.

Why don't we use only one FC? Why is this hierarchical layout of fully connected layers perhaps more useful?

enter image description here

+4
source share
2 answers

/. 2015+ (, Resnet, Inception 4) (GAP) + softmax, ​​ . 2 VGG16 80% . , 2- MLP . 1 , 2 SGD.

+1

, XOR, . , (-) . , , , .

+1

All Articles