How does pre-training improve classification in neural networks?

Question

How does pre-training improve classification in neural networks?

Many of the articles that I have read so far have mentioned that “a pre-training network can improve computational efficiency in terms of backpropagation errors” and can be achieved using RBM or Autoencoders.

If I understand correctly, AutoEncoders work by studying identity, and if it has hidden units smaller than the size of the input data, then it also does compression, BUT that it even has everything related to improving computational efficiency when propagating the error signal back ? Is it because the scales of pre-trained hidden units are not much different from its original values?
Assuming that the data scientists who read this themselves already know that AutoEncoders take input as target values, as they study an identity function that is seen as unsupervised learning, but can this method be applied to Convolutional neural networks for which first hidden feature map layer? Each object map is created by folding the core with a susceptible field in the image. This is the scientific core, how can this be obtained by preliminary preparation (uncontrolled method)?

+7

deep-learning machine-learning neural-network conv-neural-network autoencoder

VM_AI Dec 29 '15 at 16:05

source share

2 answers

I am not very good at auto-encoder theory, but I have worked a bit with RBM. What RBMs do, they predict that the probability of seeing a particular data type in order to get weights initialized for the right ball park is considered a (uncontrolled) probabilistic model, so you don’t correct the use of known labels. Basically, the idea here is that too much learning will never lead to convergence, but having too little time will take time to train. Thus, by "preliminary preparation" in this way you find a pair of glasses with weights, and then you can establish that the training speed is low in order to bring them to optimal values.

As for the second question, no, you, as a rule, do not predetermine the kernel, at least not under control. I suspect that what is meant by a preliminary review here is slightly different from your first question - this means that they take a pre-prepared model (say, from a model zoo) and fine-tune it using a new dataset.

Which model you use usually depends on the type of data that you have and on the task. The legends I found for learning are faster and more efficient, but not all data matters when minimized, in which case dbns may be the way to go. If you do not say that you have a small amount of data, then I would use something other than neural networks.

In any case, I hope this helps resolve some of your questions.

0

Abc Dec 29 '15 at 17:34

source share

Amir · Accepted Answer · 2015-12-29T17:44:15+0000

It should be noted that autocoders are trying to find out a non-trivial identification function, not an identification function. Otherwise, they would not be useful at all. Well, pre-training helps move weight vectors toward a good starting point on the error surface. Then the backpropagation algorithm, which mainly performs gradient descent, is used to improve these weights. Note that gradient descent gets stuck at closed local lows.

[Ignore the term Global Minima in the hosted image and think of it as another, better, local minimum]

Intuitively, suppose you are looking for the best way to get from source A to destination B. Having a map on which there are no routes (errors that you get at the last level of the neural network model), you say where to go. But you can go on a route that has many obstacles, hills and hills. Then suppose that someone tells you about the route in which direction he went (preliminary preparation), and gives you a new map (the starting point of the initial phase of training).

This may be an intuitive reason why, starting with random weights, and immediately starting to optimize the model with backpropagation, may not necessarily help you achieve the performance that you get with a pre-prepared model. However, please note that many models that have achieved the most up-to-date results do not use preliminary training, and they can use back propagation in combination with other optimization methods (for example, adagrad, RMSProp, Momentum and ...) to hope to avoid getting stuck in local lows are bad .

Here is the source for the second image.

How does pre-training improve classification in neural networks?

More articles: