Policy generalization for the reinforcement modeling learning algorithm based on models with large state and action spaces

Question

Policy generalization for the reinforcement modeling learning algorithm based on models with large state and action spaces

I use a one-way reinforcement simulation approach for autonomous flight.

In this project, I used a simulator to collect training data (state, action, final state) so that the algorithm Locally Weighted Linear Regressioncould find out MODEL.

STATEdetermined by the vector: [Pitch , Yaw , Roll , Acceleration]to determine the position of the unmanned in space. When set, POLICYit has one more function.[WantedTrajectory]

ACTION also determined by the vector: [PowerOfMotor1 , PowerOfMotor2 , PowerOfMotor3 , PowerOfMotor4]

REWARD it is calculated depending on the accuracy of the accepted trajectory: for a given initial spatial state, the desired trajectory and the final spatial state, it is closer to the trajectory actually adopted to the one that needs a less negative reward.

The algorithm for the policy iterationfollowing:

start from a state S0

loop    

         1) select the best action according to the Policy

         2) use LWLR to find the ending state

         3) calculate reward

         4) update generalized V function



endloop;

Thus, the action taken also depends on the desired trajectory (selected by the user), the agent autonomously selects the power of 4 engines (trying to take the desired trajectory and have a larger, less negative result) and the policy is dynamic, because it depends on the updated function of values.

The only problem is choosing POLICYas follows (S = Pitch, Yaw, Roll, Acceleration, WantedTrajectory):

π(S) = argmax_a ( V( LWLR(S,a) ) )

( , , ) , .

POLOCY VALUE?

+4

reinforcement-learning machine-learning policy

user3764449 25 . '15 20:10

1

purpletentacle · Answer 1 · 2016-02-27T12:05:19+0000

, -, , .

, , , . , ..

, :

https://www.youtube.com/watch?v=KHZVXao4qXs&index=7&list=PL5X3mDkKaJrL42i_jhE4N-p6E2Ol62Ofa

Policy generalization for the reinforcement modeling learning algorithm based on models with large state and action spaces

More articles: