Gamma determines how much memory your algorithm has. If you set the value to 0.0, then your algorithm will not update the function of the Q value at all. If you set it to 1.0, then the new experience will have the same weight as the entire previous experience. The best values lie between them and should be determined experimentally.
Here's how it works:
- . s t. , t.
- r t + 1 s t + 1. , , - a t + 1. r t + 1 + Q (s t + 1, a t + 1 > ) - Q (s < > > , < > > ). , Q (s t, a t t). , s t + 1 t + 1 s t t .
, .