In an ideal information environment, where we can know the state after the action, for example, play chess, is there any reason to use Q training rather than TD (time difference)?
As far as I understand, training in TD will try to find out the value of V (state), but Q-learning will study the value of Q (value of the state’s action), which means that studying Q learns more slowly (since the combination of the state’s action is bigger than the state), right?
Q-Learning is a TD training method (temporary difference).
I think you are trying to turn to TD (0) and Q-learning.
, . , , TD (0), . Q-, Q-.
(, , "" , ), , (.., ) , V ().
, V (), Q- Q (, ). Q- , V (s) "" s . , V (s) s.
, , V (s) , Q (s, a), , , ( ) .
(V Q) Sutton and Barto RL.