I use a one-way reinforcement simulation approach for autonomous flight.
In this project, I used a simulator to collect training data (state, action, final state) so that the algorithm Locally Weighted Linear Regressioncould find out MODEL.
STATEdetermined by the vector: [Pitch , Yaw , Roll , Acceleration]to determine the position of the unmanned in space. When set, POLICYit has one more function.[WantedTrajectory]
ACTION also determined by the vector: [PowerOfMotor1 , PowerOfMotor2 , PowerOfMotor3 , PowerOfMotor4]
REWARD it is calculated depending on the accuracy of the accepted trajectory: for a given initial spatial state, the desired trajectory and the final spatial state, it is closer to the trajectory actually adopted to the one that needs a less negative reward.
The algorithm for the policy iterationfollowing:
start from a state S0
loop
1) select the best action according to the Policy
2) use LWLR to find the ending state
3) calculate reward
4) update generalized V function
endloop;
Thus, the action taken also depends on the desired trajectory (selected by the user), the agent autonomously selects the power of 4 engines (trying to take the desired trajectory and have a larger, less negative result) and the policy is dynamic, because it depends on the updated function of values.
The only problem is choosing POLICYas follows (S = Pitch, Yaw, Roll, Acceleration, WantedTrajectory):
π(S) = argmax_a ( V( LWLR(S,a) ) )
( , , ) , .
POLOCY VALUE?