Graph Theory - Learn Cost Function to Find the Best Way

This is a controlled learning problem.

I have a directed acyclic graph (DAG). Each edge has an attribute vector X, and each node (vertex) is labeled 0 or 1. The task is to find the cost function w (X), so that the shortest path between any pair of nodes has the highest ratio of 1s to 0s (minimal error classification).

The solution should be well generalized. I tried logistic regression, and the studied logistic function predicts the node label quite well, giving the input edge functions. However, the graph topology is not taken into account by this approach; therefore, the solution on the entire graph is not optimal. In other words, the logistic function is not a good weight function at a given setting above.

Although my problem is not a typical binary classification setup problem, here is a good introduction to it: http://en.wikipedia.org/wiki/Supervised_learning#How_supervised_learning_algorithms_work

Here are some more details:

  • Each vector function X is a d-dimensional list of real numbers.
  • Each edge has a vector of functions. That is, taking into account the set of edges E = {e1, e2, .. en} and the set of function vectors F = {X1, X2 ... Xn}, then the edge ei is connected with the vector Xi.
  • You can find the function f (X), so that f (Xi) gives the probability that the edge ei points to a node labeled 1. An example of such a function is the one I mentioned above, found through regression logistics. However, as I mentioned above, such a function is not optimal.

SO QUESTION: Given the graph, the starting node and the finish of the node, how to find out the optimal cost function w (X), so that the ratio of 1s to 0s nodes is maximum (minimal classification error)?

+7
source share
3 answers

This seems like a problem when a genetic algorithm has excellent potential. If you define the desired function, for example. (but not limited to) a linear combination of characters (you can add quadratic terms, then cubic, ad inifititum), then the gene is a vector of coefficients. A mutator may simply be a random bias of one or more factors in a reasonable range. The evaluation function represents only the average ratio from 1 to 0 along the shortest paths for all pairs in accordance with the current mutation. In each generation, choose the best few genes as ancestors and mutate to form the next generation. Repeat until the ueber gene is at hand.

+1
source

This is not really the answer, but we need to clarify the question. I could come back later for a possible answer.

The following is an example of a DAG.

enter image description here

Suppose the red node is the start node, and the yellow is the end of the node. How to define the shortest path in terms

highest ratio of 1s to 0s (minimum classification error)?

Change I add names for each node and two example names for the top two edges.

It seems to me that you cannot recognize such a cost function that takes feature vectors as input values, and the output (edge ​​scales or something else) can help you make the shortest path to any node taking into account the graph topology. The reason is listed below:

  • Suppose you don’t have function vectors that you specified. Given the graph as indicated above, if you want to find an all-pair-shortest-path with a ratio of 1 to 0 s, it is ideal for using Bellman or more specifically Dijkastra plus the correct heuristic function (for example, percentage 1 on the way), Another possible approach without a model - use q-learning , in which we get a reward of +1 for visiting 1 node and - 1 for visiting a 0 node. We study the search q-table for each target node, one at a time. Finally, we have an all-pair-shortest path where all nodes are considered as target nodes.

  • Now suppose you magically get feature vectors. Since you can find the optimal solution without these vectors, why do they help when they exist?

  • There is one possible condition that you can use a vector function to find out a cost function that optimizes the weight of the borders, i.e. feature vectors depend on the graph topology (links between nodes and position 1 and 0 s). But I did not see this dependency in your description at all. Therefore, I think this does not exist.

+5
source

I believe your question is very close to the area of ​​Inverse Reinforcement Learning, where you take certain “expert demonstrations” of optimal paths and try to study the cost function to bring your planner (A * or some auxiliary training agent) the same way that and expert demonstration. This training is iterative. I think that in your case, expert demonstrations can be created by you as a path that goes through a maximum of 1 marked edges. Here is a link to a good paper on the same: Learning to Search: Functional Gradient Methods for Simulation Learning . It is from the robotics community, where traffic planning is usually configured as a graph search problem, and the training cost functions are necessary to demonstrate the desired behavior.

+1
source

All Articles