In Q-learning with function approximation, can manual functions be avoided?

I have little knowledge of mechanical learning, so please forgive me if my question seems stupid.

Based on what I read, the best learning algorithm without selecting a model up to this date is Q-Learning, where each state, pair of actions in the agent world is given a q-value, and in each state, the action with the highest q-value. Then the q value is updated as follows:

Q (s, a) = (1-α) Q (s, a) + α (R (s, a, s ') + (max_a' * Q (s ', a'))), where α is the speed learning.

Apparently, for problems with a large dimension, the number of states becomes astronomically large, which makes it impossible to store a table of q values.

Thus, the practical implementation of Q-Learning requires the approximation of Q-value by generalizing states as functions. For example, if the agent was Pacman, then the functions would be as follows:

  • Distance to the nearest point
  • Distance to the nearest ghost
  • Is Pacman in the tunnel?

And then instead of q-values ​​for each individual state, you will need to have only q-values ​​for each individual function.

So my question is:

Is it possible for a reinforcement training agent to create or create additional features?

Some studies that I have done:

This post mentions the Geramifar iMDD method

" ", , , , .

, , apopos " Atari " , " ".

, / . , ?

+4
1

, :)

Q- ( SARSA). , , ( ) / . , , .

SARSA ( Q-). , , , . , . , . (, pacman), , , .

+4

All Articles