Python Libraries for Online Machine Learning MDP

I am trying to develop an iterative Markov Process Decision Agent (MDP) in Python with the following characteristics:

  • observed condition
    • I process a potential β€œunknown” state by reserving some state space to respond to request type steps made by DP (the state at t + 1 will identify the previous request [or zero if the previous step was not a request] and also the built-in resulting vector) this space 0s is filled up to a fixed length to maintain state alignment regardless of the request (data data length may vary)
  • actions that may not always be available in all states
  • The reward function may change over time.
  • policy convergence must be incremental and only calculated per turn

Thus, the main idea is that the MDP should optimize its course at T as much as possible using its current probabilistic model (and since its probabilistic course, which it makes, is expected to be stochastic, implies a possible randomness), a pair of a new input state at T + 1 with reward from the previous move on T and revaluation of the model. Convergence does not have to be constant, because rewards can be modulated or available actions can change.

I would like to know if there are any existing python libraries (preferably cross-platform, since I will definitely change the environment between Windoze and Linux) that can do such things already (or can support it with a suitable setting, for example: support for a derived class , which allows you to redefine the reward method with one of its own).

I find information about online learning in MDP pretty scarce. Most of the uses of MDP that I can find seem to focus on deciding the entire policy as a preprocessing step.

+8
python machine-learning markov
source share
2 answers

I’m a cooling student with a lot of MCMC stuff in Python, and as far as I know, nothing implements MDP directly. The closest I know about is PyMC . Delving into the documentation provided by this , which gives some tips for extending their classes. They definitely do not have rewards, etc. Available out of the box.

If you are serious about developing something good, you can consider expanding and subclassing PyMC materials to create decision-making processes, because then you can include it in the next PyMC update and help many future people.

+1
source share

Here is the python toolbar for MDP .

Caution: this is for MDP for vanilla textbooks and not for partially observable MDP (POMDP) ​​or any kind of non-stationarity in rewards.

Second warning: I found that the documentation is really in short supply. You should look in the python code if you want to know what it implements, or you can quickly look through the documentation for similar tools for MATLAB .

0
source share

All Articles