TD training vs Q learning

In an ideal information environment, where we can know the state after the action, for example, play chess, is there any reason to use Q training rather than TD (time difference)?

As far as I understand, training in TD will try to find out the value of V (state), but Q-learning will study the value of Q (value of the state’s action), which means that studying Q learns more slowly (since the combination of the state’s action is bigger than the state), right?

+4
source share
2 answers

Q-Learning is a TD training method (temporary difference).

I think you are trying to turn to TD (0) and Q-learning.

, . , , TD (0), . Q-, Q-.

+1

(, , "" , ), , (.., ) , V ().

, V (), Q- Q (, ). Q- , V (s) "" s . , V (s) s.

, , V (s) , Q (s, a), , , ( ) .

(V Q) Sutton and Barto RL.

+1

All Articles