I have little knowledge of mechanical learning, so please forgive me if my question seems stupid.
Based on what I read, the best learning algorithm without selecting a model up to this date is Q-Learning, where each state, pair of actions in the agent world is given a q-value, and in each state, the action with the highest q-value. Then the q value is updated as follows:
Q (s, a) = (1-α) Q (s, a) + α (R (s, a, s ') + (max_a' * Q (s ', a'))), where α is the speed learning.
Apparently, for problems with a large dimension, the number of states becomes astronomically large, which makes it impossible to store a table of q values.
Thus, the practical implementation of Q-Learning requires the approximation of Q-value by generalizing states as functions. For example, if the agent was Pacman, then the functions would be as follows:
- Distance to the nearest point
- Distance to the nearest ghost
- Is Pacman in the tunnel?
And then instead of q-values for each individual state, you will need to have only q-values for each individual function.
So my question is:
Is it possible for a reinforcement training agent to create or create additional features?
Some studies that I have done:
This post mentions the Geramifar iMDD method
" ", , , , .
, , apopos " Atari " , " ".
, / . , ?