SARSA Algorithm

I'm having trouble understanding the SARSA algorithm: http://en.wikipedia.org/wiki/SARSA

In particular, when updating the Q value, what is gamma? and what values ​​are used for s (t + 1) and a (t + 1)?

Can someone explain this algorithm to me?

Thank.

+5
source share
1 answer

Gamma determines how much memory your algorithm has. If you set the value to 0.0, then your algorithm will not update the function of the Q value at all. If you set it to 1.0, then the new experience will have the same weight as the entire previous experience. The best values ​​lie between them and should be determined experimentally.

Here's how it works:

  • . s t. , t.
  • r t + 1 s t + 1. , , - a t + 1. r t + 1 + Q (s t + 1, a t + 1 > ) - Q (s < > > , < > > ). , Q (s t, a t t). , s t + 1 t + 1 s t t .

, .

+4

All Articles