I have a DAG. I have this operation to add an edge between two nodes.
If A is available from B, then B is the parent. If A can be reached from B without going through another node, then B is the direct parent.
Requirements for this schedule:
- No cycles.
- For any node, there is a list of direct parents P [1], P [2], P [3] ... P [i] is not a parent of P [j] for any i and k.
When adding an edge, requirement 1 is not satisfied, the edge is not built. When adding an edge, requirement 2 is not satisfied, the edge is built, but the direct parents will be modified so that requirement 2 is satisfied.
For example, there are 3 nodes
- A, direct parents: none
- B, direct parents: A
- C, direct parents: A
now if i add an edge between B and C we have
but A is the parent of B, does not meet requirement 2, therefore A is removed from the direct parent of C, and we have
Currently, I have done the following: Add an edge from A to B (this A will become the parent of B)
- Check if B is the parent of the BFS. If so, don't add an edge (make sure there are no loops)
- Check if AB is already B the parent of the BFS. If so, do not add an edge.
- Find the intersection of the parent with the direct parent of B. This is done by looking for each parent of A, although BFS. Remove the intersection from direct parent B and add A as direct parent of B. (2 and 3 make sure it meets requirement 2)
It is slow. It breaks down to the 5k node level (I'm looking for this to process any graph with nodes less than 100k), the speed becomes unacceptable, 0.02 seconds to add a node edge.
I have a feeling that steps 1 and 2 can be done in 1 step using some other algorithm.
I was thinking about using topological ordering, but it should cross the entire graph, which is the worst case for my steps 1 and 2. When adding a new node, the order will be out of order. therefore, I have to run the topological ordering for the insert each time, so it does not bring any benefit. For step 3, you need to find the entire set of parents. The process is rather slow, since on average it is across a decent part of the schedule.
How can I make it more efficient?