Concept
Your problem consists mainly of three steps:
- identify related components of the undirected graph
G - represent each connected component with its vertex with a minimum id
- match each edge of the graph with a representative of the connected component to which it belongs.
The problem is solved as follows:
- taking into account the "equivalent" directed graph
D_1 based on the natural numerical ordering of the vertex vertices of G - abbreviation
D_1 to a bipartite graph B containing all candidate identifiers by abstracting the chains in the ordering represented by D_1 . - definition of minimum representative identifiers in
B - establishing a map from
G edges to edges of B and, therefore, its loosely coupled components (WCC).
More details
The set of edges G is exactly the set of pairs (id1, id2) specified by the unique_pairs records. The set of vertices G corresponds to the set of values โโfrom id unique_pairs columns. Since G not oriented, it does not matter whether to represent an edge by writing (id1, id2) or (id2, id1) ; in the future we will assume wlog that id1 < id2 is executed for each record (id1, id2) in unique_pairs .
We solve the problem by constructing an auxiliary structure based on the digraph D_1 obtained from G by orienting each edge of the undirected graph in accordance with the numerical order of the identifiers of its falling vertices. Let D_2 be D_1 with all the opposite orientations of the edges. Identifying the connected components of the source graph means identifying the WCC at D_1 or D_2 . Note that D_1 is acyclic in construction.
The auxiliary auxiliary structure is a mapping of the set of edges in G onto the set of cardinalities of maximal chains (= path not contained in another path) in D_1 , so that all chains in the image of an edge e contain e and maximize the difference between the numerical identifier of its terminals (= initial and final tops). The use of power plants is technical, because otherwise the comparison may be inaccurate if the two maximum chains have the same terminals but follow โalternative routesโ (usually the image will contain one element).
In the next step, we truncate the map: instead of the sets of maximum chains, the image will be a pair of terminal ids of any maximum chain from the original image.
With this mapping, we construct an auxiliary bipartite graph B whose edges exactly represent ordered pairs of terminal identifiers. The graph is bipartite, since no final vertex of the maximum chain can be the starting vertex of another maximum chain, and vice versa - if this happened, then the chains would not be maximum.
However, by construction, B forms part of the transitive closure D_1 . Therefore, the vertex sets B WCC are subsets of the vertex sets D_1 WCC. We are ultimately interested in the top with a minimum id in each of the latest WCC. Since no final identifier contained in the maximum chain can be less than the identifier of the beginning of the vertex, all these vertices must be contained in the set of vertices B Therefore, we should consider B WCC with only the minimum terminal identifier of each of them. Together with the additional mapping of all the edges of G to the edges of B and, therefore, to B WCC, we solve the original problem.
SQL implementation
At the preliminary processing stage, the edges G normalized to the natural ordering of its falling vertices. Assuming that the orientation of the edges is id1 -> id2 , test_unique_pairs also represents D_1 .
update test_unique_pairs set id1 = id2 , id2 = id1 where id1 > id2 ;
The first query maps the edges D_1 to the edges of B , i.e. pairs of terminal ids of maximal chains in D_1 . These chains are acyclic in construction and will be calculated by hierarchical query on test_unique_pairs , and they will be represented by their start and end vertices, and these pairs also represent edges in B This mapping takes place modulo alternative circuits separating both terminals. Different maximum paths are considered equivalent if they share both terminals. Hierarchical queries in the oracle make it easy to extract maximum elements. However, the minimum elements are the maximum elements of the hierarchy with the inverse relationship between parents and children (to satisfy these calculations is the reason for considering both digraphs D_1 , D_2 ).
To facilitate the second stage of processing, the request identifier contained in the definition of the view:
create or replace view test_up_chains as select * from ( -- collect minimal/maximal reachable id for each edge in G select distinct inter.id1 , inter.id2 , min(minelt) chainstart , max(maxelt) chainend from ( -- collect minimum/maximum terminal id for chains starting/ending with any of G edges select distinct inter_min.id1 , inter_min.id2 , inter_min.minelt , inter_max.maxelt from ( -- hierarchical query for maximal chains in D_1 select distinct hup.ID1 , hup.ID2 , CONNECT_BY_ROOT hup.id2 maxelt from test_unique_pairs hup connect by nocycle prior hup.id1 = hup.id2 start with hup.id2 in ( select ref1.id2 from test_unique_pairs ref1 where not exists ( select 1 from test_unique_pairs ref2 where ref2.id1 = ref1.id2 ) ) ) inter_max join ( -- hierarchical query for maximal chains in D_2 ( = D_1 with reversed edge orientations ) select distinct hup.ID1 , hup.ID2 , CONNECT_BY_ROOT hup.id1 minelt from test_unique_pairs hup connect by nocycle prior hup.id2 = hup.id1 start with hup.id1 in ( select ref1.id1 from test_unique_pairs ref1 where not exists ( select 1 from test_unique_pairs ref2 where ref2.id2 = ref1.id1 ) ) ) inter_min on ( inter_min.id1 = inter_max.id1 AND inter_min.id2 = inter_max.id2 ) ) inter group by grouping sets ( (inter.id1, inter.id2) ) ) base order by base.id1 , base.id2 ;
Then edges B are combined in WCC and their corresponding minimum identifiers are extracted. The following hierarchical query, devoid of the STARTS WITH , builds a complete transitive closure with respect to the edge incident, and for each edge in B creates root paths to all other edges in the same WCC. Thus, minimizing the starting vertex of the root of the entire hierarchy, taking into account the edges in G finds the minimum identifier of the reachable in G Cycles are suppressed by the NOCYCLE directive.
select * from ( select distinct id1 , id2 , min(chainlim) chainrep from ( select distinct id1 , id2 , connect_by_root chainstart chainlim from test_up_chains connect by nocycle ( ( prior chainstart = chainstart and prior chainend != chainend ) OR ( prior chainstart != chainstart and prior chainend = chainend ) ) ) base group by grouping sets ( ( base.id1, base.id2 ) ) ) order by id1 , id2 ;