Neural network training with gain training

I know the basics of feeder neural networks and how to train them using the backpropagation algorithm, but I'm looking for an algorithm that I can use to train online ANN with reinforcement training.

For example, the problem of turning the basket is what I would like to solve with ANN. In this case, I do not know what needs to be done to control the pendulum, I only know how close I am to the ideal position. I need ANN to learn based on remuneration and punishment. Therefore, supervised learning is not an option.

Another situation is something like = snake game, where responses are delayed and limited by goals, not goals, not rewards.

I can think of some algorithms for the first situation, such as climbing or genetic algorithms, but I assume that both of them will be slow. They may also be applicable in the second scenario, but they are incredibly slow and do not contribute to online learning.

My question is simple: Is there a simple algorithm for training an artificial neural network with gain training? . I am mainly interested in real-time reward situations, but if an algorithm for the target situation is available, even better.

+55
language-agnostic algorithm reinforcement-learning machine-learning neural-network
May 23 '12 at 2:27 pm
source share
2 answers

There are several research papers on the topic:

And some code:

These are just some of the best Google search results on this topic. The first two articles look as if they are not bad, although I did not read them personally. I think you'll find even more information on gain training neural networks if you quickly browse Google Scholar.

+24
May 23 '12 at 14:42
source share

If the result that leads to the reward r is returned to the network r times, you will strengthen the network in proportion to the reward. This does not apply directly to negative rewards, but I can imagine two solutions that will create different effects:

1) If you have a set of rewards in the rmin-rmax range, drag them to 0-(rmax-rmin) so they are non-negative. The greater the reward, the stronger the reinforcement.

2) For a negative reward -r , backpropagate a random exit r times if it is different from the one that results in a negative reward. This not only enhances the desired outputs, but also disperses or prevents poor results.

+7
May 23 '12 at 14:42
source share



All Articles