Creating a ranking using a genetic algorithm,

Question after the release of BIG:

I need to build a ranking using a genetic algorithm, I have the following data:

P(a>b)=0.9 P(b>c)=0.7 P(c>d)=0.8 P(b>d)=0.3 

now, interprets a,b,c,d as the names of football teams, and P(x>y) is the probability that x will win with y . We want to create a team rating, we are missing some observations P(a>d) , P(a>c) absent due to the lack of matches between vs d and vs c. The goal is to find the order of the team names that best describes the current situation in these four team leagues.

If we have only 4 teams than a simple solution, first we calculate the probabilities for all 4!=24 orders from four teams, ignoring the missing values:

P(abcd)=P(a>b)P(b>c)P(c>d)P(b>d)

P(abdc)=P(a>b)P(b>c)(1-P(c>d))P(b>d)

...

P(dcba)=(1-P(a>b))(1-P(b>c))(1-P(c>d))(1-P(b>d))

and we choose the ranking with the highest probability. I do not want to use any other fitness function.

My question is:

Like the number of permutations of n elements n! the calculation of the probabilities for all orderings is impossible for large n (my n is about 40). I want to use a genetic algorithm for this problem.

The mutation operator is a simple switching of the places of two (or more) ranking elements.

But how to make a crossover of two orders?

Can P(abcd) interpreted as a cost function of the path “abcd” in the asymmetric TSP problem, but the cost of moving from x to y differs from the cost of moving from y to x, P(x>y)=1-P(y<x) ? There are so many crossover operators for the TSP problem, but I think I need to develop my own crossover operator because my problem is slightly different from TSP. Do you have ideas for a solution or a framework for conceptual analysis?

The easiest way, at the conceptual and implementation level, is to use the crossover operator, which makes sub-users exchange between two solutions:

CrossOver(ABcD,AcDB) = AcBD

for a random subset of elements (in this case, "a, b, d" in capital letters), we copy and paste the first subordination - the sequence of elements "a, b, d" in the second order.

Edition : Asymmetric TSP can be turned into a symmetric TSP, but with forbidden suborders that make the GA approach unusable.

+3
optimization genetic-algorithm ranking mathematical-optimization traveling-salesman
source share
3 answers

This is definitely an interesting problem, and it seems that most of the answers and comments have focused on the semantic aspects of the problem (i.e. on the meaning of the fitness function, etc.).

I will talk about some syntax elements - how you perform crossover and / or mutation in ways that make sense. Obviously, as you noted in parallel with TSP, you have a permutation problem. Therefore, if you want to use GA, the natural representation of candidate decisions is simply an ordered list of your points, carefully avoiding repetition, i.e. Permutations.

TSP is one of these permutation problems, and there are a number of crossover operators (for example, crossover crossovers ) that you can take from TSP algorithms and use directly. However, I think you will have problems with this approach. Basically, the problem is this: in TSP, adjacency is an important quality of solutions. That is, abcd has the same suitability as cdab, because it is the same tour, just starting and ending in another city. In your example, absolute position is much more important than this concept of relative position. abcd means in a sense that a is the best point - it is important that it appears first in the list.

The key thing you must do to get an effective crossover operator is to take into account that the properties in the parents make them good, and try to extract and combine these properties. Nick Radcliffe called this "respectful recombination" (note that the paper is quite old, and now the theory is understood a little differently, but the principle sounds). Taking the operator designed by TSP and applying it to your problem will lead to the creation of offspring that tries to store irrelevant information from the parents.

You ideally need an operator that tries to maintain an absolute position in the string. The best I know is called Cycle Crossover (CX). I miss the good help from my head, but I can point you to some code where I implemented it as part of my work with graduates . The basic idea of ​​CX is quite difficult to describe and much easier to see in action. Take the following two points:

 abcdefgh cfhgedba 
  • Select a random starting point in parent 1. For simplicity, I am just starting from position “0” with “a”.

  • Now go straight down to parent 2 and watch the value there (in this case, "c").

  • Now find the “c” in parent 1. We will find it at position 2.

  • Now go straight down and observe the “h” at parent 2, at position 2.

  • Look for this “h” in parent 1, found at position 7.

  • Go down and watch "a" in parent 2.

  • At this point, note that if we look for “a” in the parent, we will reach the position where we were already. An ongoing past that will simply cycle. In fact, we call the sequence of positions that we visited (0, 2, 7), "cycle". Please note that we can simply exchange values ​​at these positions between parents as a group, and both parents retain the permutation property, because we have the same three values ​​in each position of the cycle for both parents in only different orders.

  • Replace the items included in the cycle.

Please note that this is only one cycle. Then you repeat this process, starting with a new (invisible) position every time until all positions are included in the cycle. After one iteration described above, you get the following lines (where "X" indicates the position in the loop at which the values ​​were exchanged between the parents.

 cbhdefga afcgedbh XXX 

Just keep searching and replacing loops until you are done.

The code associated with my github account will be tightly tied to my own metaheuristics structure, but I believe that it is a fairly simple task to get the basic algorithm out of the code and adapt it for your own system.

Please note that you can win quite a bit to do something more customized for your specific domain. I think something like CX will make a better black box algorithm than something based on the TSP operator, but black boxes are usually the last resort. Other people's suggestions may lead to a better algorithm.

+3
source share

I worked on a somewhat similar ranking problem and followed a technique similar to the one I describe below. This works for you:

Suppose that an unknown value object deviates from your estimate through some distribution, say, a normal distribution. Interpret your ranking statements, such as a > b, 0.9 , as the expression "The value of a lies at the 90% percentile of the distribution centered on b ."

For each statement:

 def realArrival = calculate a location on a distribution centered on b def arrivalGap = | realArrival - expectedArrival | def fitness = Σ arrivalGap 

Fitness Function MIN(fitness)

FWIW, my problem was actually the problem of packing a basket, where the equivalent of your “rank” applications was a user-assigned rating (1, 2, 3, etc.). So not exactly TSP, but NP-Hard. OTOH, bin packaging has a pseudo-polynomial solution proportional to the error accepted, and this is what I ended up using. I'm not quite sure that I will work with your probabilistic ranking statements.

+2
source share

What an interesting problem! If I understand this, then you really ask:

"Given a weighted oriented graph with each graph weight in the graph representing the probability that the arc is drawn in the right direction, return the complete sequence of nodes with the maximum probability of being a topological view of the graph."

So, if your graph has N edges, there are 2 ^ N graphs of different probability, with some orders appearing in more than one graph.

I don’t know if this will help (very short Google searches didn’t enlighten me, but maybe you will have more success with more perseverance), but I think that the search for “topological sorting” combined with any of the “probabilistic” ones, “random”, “noise” or “error” (since the weight of the ribs can be considered a reliability factor) can be useful.

I highly doubt your statement, in your example, that P (a> c) is not required. You know your application space best, but it seems to me that specifying P (a> c) = 0.99 will give a different suitability for f (abc) than specifying P (a> c) = 0.01.

You might also want to cast Bayesian, as you can start to output values ​​for (in your example) P (a> c) based on your conditions and hypothetical decisions. The problem is that “topological sorting” and “Bayesian sorting” are going to give you a whole bunch of hits related to brand chains and brand decision problems, which may or may not be useful.

+2
source share

All Articles