Must be stuck at local optima in the neural network

Question

Must be stuck at local optima in the neural network

I need to get stuck in local optimization in a direct neural network. I need an example and initialization of the scales, with the help of which the use of a steep gradient descent will get stuck in local optima (within certain boundary weights for each measurement). I cannot find such an example, at least it seems so, and therefore can not test the new algorithm.

Can someone point out some documents, resources, or provide me with an example of how to get stuck in local optima.

+4

machine-learning neural-network

phoxis Mar 25 '13 at 19:24

source share

1 answer

schreon · Answer 1 · 2013-03-25T22:30:09+0000

Let's analyze what it means to get stuck in local optima. Take a look at the SARPROP document . SARPROP is a learning algorithm for feedback neural networks that definitely aims to avoid getting stuck in local optima. Look at the pic. 1 on page 3 of the linked document. It shows the surface of the error with respect to one single weight . In the early stages of training, this error surface will change rapidly. But as soon as the algorithm approaches convergence, this error surface with respect to one weight will stabilize. Now you are stuck in a local optimum with respect to a certain weight, if your training algorithm is not able to "push" the weight over the "hill" to achieve a better optimum. SARPROP is trying to solve this problem by adding positive noise to the weight update involved in the original RPROP. Thus, the algorithm can be forced out of such "valleys".

Now, in order to build convergence in local optima, you must calculate a set of random weights that remain fixed in the following. Now use a learning algorithm that is known to quickly converge at local optima such as RPROP. Then use the same weight initialization and apply SARPROP or your new algorithm. Then compare, for example. Root Mean Squared Error on your training data as soon as the network goes down. Do this with a few hundred weight initializations and apply statistics.

Must be stuck at local optima in the neural network

More articles: