As I noticed, in many popular convolutional neural network architectures (such as AlexNet), people use more than one fully connected layer with almost the same size to collect responses to previously discovered functions at early levels.
Why don't we use only one FC? Why is this hierarchical layout of fully connected layers perhaps more useful?
/. 2015+ (, Resnet, Inception 4) (GAP) + softmax, . 2 VGG16 80% . , 2- MLP . 1 , 2 SGD.
, XOR, . , (-) . , , , .