Recently, I read a lot about Q-training with Neural Networks and thought about updating the existing old optimization system in a power boiler, consisting of a simple neural network with feedback, approximating the output of many sensory inputs. The output is then connected to a linear controller based on the model, which somehow again displays the optimal action, so the whole model can converge to the desired goal.
Defining linear models is a consuming task. I was thinking of updating all of this for free Q-learning using the Q-function neural network. I drew a chart to ask you if I am on the right track or not.

My question is: if you think I understood the concept well, should my training set consist of State Features vectors on the one hand and Q_target - Q_current (here I assume that the reward is growing) to force the whole model to the goal or something is it missing?
Note. The diagram shows a comparison of the old system at the top with the proposed change at the bottom.
EDIT: Does the state neural network reproduce experience?
python artificial-intelligence reinforcement-learning machine-learning tensorflow
Leb_broth
source share