I am working on the task of reinforcing learning and decided to use the keras NN model to approximate the Q value. This approach is general: after each action, the reward is stored in a memory repeat array, then I take an arbitrary sample from it and fit the model with new data state-action => reward+predicted_Q(more details here ). To complete a workout, the Q value must be predicted for each element in the workout set.
The script is very slow, so I started an investigation. Profiling shows that 56.87% of the cumulative time is used by the _predict_loop method :
And it looks weird because prediction is just a one-way spread. Just a single multiplication of many numbers. The model I use is very simple: 8 inputs, 5 nodes on a hidden level, 1 output.
I installed and configured CUDA, launched some test cases, and it shows that the GPU is being used, I also see a huge load on the GPU. When I run my code, a message appears: "Using a gpu 0 device: GeForce GT 730", but I see that the GPU load is really low (about 10%).
? GPU ?