Timeline / Chronology Algorithms

I'm looking for algorithmic leads to get the chronology / chronology of a series of novels. I divided the texts into several days and created a database of relations between them, for example: X a month before Y, Y and Z are consecutive, date Z is known, X on Tuesday, etc. There is uncertainty (“month” really only means about 30 days), as well as contradictions. I can mark some relationships as more reliable than others to help eliminate ambiguity and contradictions.

Are there any algorithms for determining the best fit of the chronology of this type of data, assigning the date of the highest probability to each day? At least the time is 1-dimensional, but considering a complex relationship schedule with inconsistency seems nontrivial. I have a CS background, so I can code something, but some insight into the names of the applicable algorithms would be helpful. I assume that I have a graph with days as nodes as relationships as edges.

+7
algorithm graph-algorithm
source share
2 answers

Programming constraints is what you need. In distribution-based CPs, you alternate (a) making a decision at the current selection point in the search tree and (b) distributing the consequences of that decision as much as possible. It is clear that you do this by maintaining a domain D possible values ​​for each problem variable x in such a way that D(x) is a set of values ​​for x that are not yet excluded along the current search path. In your problem, you can reduce it to a large set of logical variables, x_ij , where x_ij is true if event i precedes event j . Initially, D(x) = {true, false} for all variables. The solution simply reduces the scope of the undefined variable (for a Boolean variable, this means that its domain is reduced to a single value, true or false, which coincides with the purpose). If at any point in the search path D(x) becomes empty for any x , you have reached a dead end and must retreat.

If you are smart, you will try to learn from each failure and backtrack as soon as you back up the search tree to avoid redundant searches (this is called backjumping - for example, if you find that the end you reached at level 7 was triggered by a selection, which you made at level 3, there is no point in retreating only to level 6, because there is no solution in this subtree, given the choice you made at level 3!).

Now, given that you have varying degrees of trust in your data, you really have an optimization problem. That is, you are not just looking for a solution that satisfies all the constraints that need to be true, but that also best satisfy the other soft constraints, depending on the degree of trust you have. What you need to do here is to solve an objective function that assigns an assessment to a specific set of satisfied / violated partial constraints. Then you want to shorten your search when you find that the current search path cannot improve the best solution found earlier.

If you decide to go for a logical approach, you can enjoy exploring SAT solvers that break these problems. But the first thing I would like to see is MiniZinc , a CP language that maps to many different modern constraint solvers.

Good luck

0
source share

A simple rough first approximation to your problem would be to store information like "A that happened before B" in an oriented graph with edges like "A → B". Check the graph to see if it is a direct acyclic graph (DAG). If so, the information is consistent in the sense that there is a consistent chronology of what happened before what else. You can get a sample linear chronology by typing “topological sorting” (top) of the DAG. If events C and D occurred at the same time or there is no information to say what happened before the other, they can be displayed in the upper array as ABCD or ABDC. You can even get a topsort algorithm to print all the features (like ABCD and ABDC) for further analysis using more detailed information.

If the resulting graph is not a DAG, you can use an algorithm similar to the Tarjan algorithm to quickly determine the “tightly connected components” that are areas of the graph that contain chronological inconsistencies in the form of loops. Then you could analyze them more carefully to determine which less reliable edges can be removed to resolve the contradictions. Another way to identify edges to remove to eliminate loops is to look for “minimal sets of feedback arc”. This NP is hard at all, but if your tightly connected components are small, a search can be feasible.

0
source share

All Articles