Upgrading an Old System to Q-Learning with Neural Networks

Recently, I read a lot about Q-training with Neural Networks and thought about updating the existing old optimization system in a power boiler, consisting of a simple neural network with feedback, approximating the output of many sensory inputs. The output is then connected to a linear controller based on the model, which somehow again displays the optimal action, so the whole model can converge to the desired goal.

Defining linear models is a consuming task. I was thinking of updating all of this for free Q-learning using the Q-function neural network. I drew a chart to ask you if I am on the right track or not.

model

My question is: if you think I understood the concept well, should my training set consist of State Features vectors on the one hand and Q_target - Q_current (here I assume that the reward is growing) to force the whole model to the goal or something is it missing?

Note. The diagram shows a comparison of the old system at the top with the proposed change at the bottom.

EDIT: Does the state neural network reproduce experience?

+7
python artificial-intelligence reinforcement-learning machine-learning tensorflow
source share
1 answer

You can simply use the entire Q value of all actions in the current state as the output level on your network. Poorly drawn diagram here

Thus, you can recommend the ability of NN to output multiple Q values ​​at once. Then simply return the support using the loss obtained with Q(s, a) <- Q(s, a) + alpha * (reward + discount * max(Q(s', a')) - Q(s, a) , where max(Q(s', a')) can be easily calculated from the output level.

Please let me know if you have further questions.

+1
source share

All Articles