Voice of the Google interview algorithm: the expected size of the largest connected component in a random simple graph (N nodes, N edges)?

For a random simple graph of nodes N , edges N and uniform probability of edge p, what is the expected size of the largest connected component?

  • The only input parameter is N , which represents the number of nodes in the graph.
  • The graph is simple, which means that it is not allowed to contain any self-tests.
  • There are N node pairs, i.e. the egist is of the form {(1,a), (2,b), (3,c), ..., (N,[next available letter of the alphabet])} . The condition is that a! = 1, b! = 2, c! = 3, etc., but besides this, a, b, c, ... can take any value from 1 to N inclusive
  • node n forming an edge with some other node can be described using a uniform probability distribution
  • In fact, such an admissible graph constructed from nodes N and N will find the size of the largest connected component S
  • Under a connected component, this is defined as the largest cluster / group of nodes, where each node in a connected component can reach / reach from all other nodes in the same connected component

The question arises: among all such regular graphs constructed, the expected value of S is expected?

I would like to learn about an effective way to think / analyze / solve this problem, which can deal with N in the range from 2 to 50 inclusive. Some code for illustration would be helpful to my understanding.


I wrote the following sources for naive brute force capabilities, hoping to find a pattern from there. Managed to get to N = 9 using these:

Both attempts to generate all possible permutations in accordance with the criteria specified in the problem, and then calculate S for each graph generated through BFS.

Data output

Regarding the format, for example. for the line " N=4: {2:3,4:78} 106/27 " this means that there are 3 graphs with the size of the largest component = 2 and 78 graphs with the size of the largest component = 4, so the final answer = (3/(78+3))*2 + (78/(78+3))*4 = 106/27 .

 N=2: {2:1} 2/1 N=3: {3:8} 3/1 N=4: {2:3,4:78} 106/27 N=5: {3:80,5:944} 155/32 N=6: {2:15,3:640,4:1170,6:13800} 17886/3125 N=7: {3:840,4:21840,5:19824,7:237432} 38563/5832 N=8: {2:105,3:17920,4:229320,5:422912,6:386400,8:4708144} 6152766/823543 N=9: {3:153440,4:786240,5:9634464,6:9273600,7:8547552,9:105822432} 17494593/2097152 

edit: The just discovered OEIS Sequence # A000435 gives answers (formula / enumeration up to N = 18) for graph numbers with N nodes and the largest component of size = N. This exactly matches my bruteforce result, for example. for N = 9 there are 105822432 graphs with the largest connected component size = 9. Not sure if this helps - one of the formulas given here to get this for a larger N is a(n) = (n-1)! * Sum (n^k/k!, k=0..n-2) a(n) = (n-1)! * Sum (n^k/k!, k=0..n-2) Python implementation


Example for N = 4 :

There are a total of 81 possible boundary assignments for 4 nodes (based on index 1 from 1 to N ).

The format of the output example is below: ((1, 2), (2, 1), (3, 1), (4, 1)): 4 means that there are edges between nodes 1 and 2, nodes 2 and 1, nodes 3 and 1, as well as nodes 4 and 1. Assume that the edges are not oriented / bidirectional. For such a graph, there is only one communication component from all 4 nodes {1,2,3,4}, so the size of the largest (single) communication component is 4.

After creating this list, the expected value of S can be calculated through (fraction of all 81 instances where S == 4) * 4 + (fraction of all 81 instances where S == 2) * 2 = 106/27 - since the only values ​​of S are 2.4.

 {((1, 2), (2, 1), (3, 1), (4, 1)): 4, ((1, 2), (2, 1), (3, 1), (4, 2)): 4, ((1, 2), (2, 1), (3, 1), (4, 3)): 4, ((1, 2), (2, 1), (3, 2), (4, 1)): 4, ((1, 2), (2, 1), (3, 2), (4, 2)): 4, ((1, 2), (2, 1), (3, 2), (4, 3)): 4, ((1, 2), (2, 1), (3, 4), (4, 1)): 4, ((1, 2), (2, 1), (3, 4), (4, 2)): 4, ((1, 2), (2, 1), (3, 4), (4, 3)): 2, ((1, 2), (2, 3), (3, 1), (4, 1)): 4, ((1, 2), (2, 3), (3, 1), (4, 2)): 4, ((1, 2), (2, 3), (3, 1), (4, 3)): 4, ((1, 2), (2, 3), (3, 2), (4, 1)): 4, ((1, 2), (2, 3), (3, 2), (4, 2)): 4, ((1, 2), (2, 3), (3, 2), (4, 3)): 4, ((1, 2), (2, 3), (3, 4), (4, 1)): 4, ((1, 2), (2, 3), (3, 4), (4, 2)): 4, ((1, 2), (2, 3), (3, 4), (4, 3)): 4, ((1, 2), (2, 4), (3, 1), (4, 1)): 4, ((1, 2), (2, 4), (3, 1), (4, 2)): 4, ((1, 2), (2, 4), (3, 1), (4, 3)): 4, ((1, 2), (2, 4), (3, 2), (4, 1)): 4, ((1, 2), (2, 4), (3, 2), (4, 2)): 4, ((1, 2), (2, 4), (3, 2), (4, 3)): 4, ((1, 2), (2, 4), (3, 4), (4, 1)): 4, ((1, 2), (2, 4), (3, 4), (4, 2)): 4, ((1, 2), (2, 4), (3, 4), (4, 3)): 4, ((1, 3), (2, 1), (3, 1), (4, 1)): 4, ((1, 3), (2, 1), (3, 1), (4, 2)): 4, ((1, 3), (2, 1), (3, 1), (4, 3)): 4, ((1, 3), (2, 1), (3, 2), (4, 1)): 4, ((1, 3), (2, 1), (3, 2), (4, 2)): 4, ((1, 3), (2, 1), (3, 2), (4, 3)): 4, ((1, 3), (2, 1), (3, 4), (4, 1)): 4, ((1, 3), (2, 1), (3, 4), (4, 2)): 4, ((1, 3), (2, 1), (3, 4), (4, 3)): 4, ((1, 3), (2, 3), (3, 1), (4, 1)): 4, ((1, 3), (2, 3), (3, 1), (4, 2)): 4, ((1, 3), (2, 3), (3, 1), (4, 3)): 4, ((1, 3), (2, 3), (3, 2), (4, 1)): 4, ((1, 3), (2, 3), (3, 2), (4, 2)): 4, ((1, 3), (2, 3), (3, 2), (4, 3)): 4, ((1, 3), (2, 3), (3, 4), (4, 1)): 4, ((1, 3), (2, 3), (3, 4), (4, 2)): 4, ((1, 3), (2, 3), (3, 4), (4, 3)): 4, ((1, 3), (2, 4), (3, 1), (4, 1)): 4, ((1, 3), (2, 4), (3, 1), (4, 2)): 2, ((1, 3), (2, 4), (3, 1), (4, 3)): 4, ((1, 3), (2, 4), (3, 2), (4, 1)): 4, ((1, 3), (2, 4), (3, 2), (4, 2)): 4, ((1, 3), (2, 4), (3, 2), (4, 3)): 4, ((1, 3), (2, 4), (3, 4), (4, 1)): 4, ((1, 3), (2, 4), (3, 4), (4, 2)): 4, ((1, 3), (2, 4), (3, 4), (4, 3)): 4, ((1, 4), (2, 1), (3, 1), (4, 1)): 4, ((1, 4), (2, 1), (3, 1), (4, 2)): 4, ((1, 4), (2, 1), (3, 1), (4, 3)): 4, ((1, 4), (2, 1), (3, 2), (4, 1)): 4, ((1, 4), (2, 1), (3, 2), (4, 2)): 4, ((1, 4), (2, 1), (3, 2), (4, 3)): 4, ((1, 4), (2, 1), (3, 4), (4, 1)): 4, ((1, 4), (2, 1), (3, 4), (4, 2)): 4, ((1, 4), (2, 1), (3, 4), (4, 3)): 4, ((1, 4), (2, 3), (3, 1), (4, 1)): 4, ((1, 4), (2, 3), (3, 1), (4, 2)): 4, ((1, 4), (2, 3), (3, 1), (4, 3)): 4, ((1, 4), (2, 3), (3, 2), (4, 1)): 2, ((1, 4), (2, 3), (3, 2), (4, 2)): 4, ((1, 4), (2, 3), (3, 2), (4, 3)): 4, ((1, 4), (2, 3), (3, 4), (4, 1)): 4, ((1, 4), (2, 3), (3, 4), (4, 2)): 4, ((1, 4), (2, 3), (3, 4), (4, 3)): 4, ((1, 4), (2, 4), (3, 1), (4, 1)): 4, ((1, 4), (2, 4), (3, 1), (4, 2)): 4, ((1, 4), (2, 4), (3, 1), (4, 3)): 4, ((1, 4), (2, 4), (3, 2), (4, 1)): 4, ((1, 4), (2, 4), (3, 2), (4, 2)): 4, ((1, 4), (2, 4), (3, 2), (4, 3)): 4, ((1, 4), (2, 4), (3, 4), (4, 1)): 4, ((1, 4), (2, 4), (3, 4), (4, 2)): 4, ((1, 4), (2, 4), (3, 4), (4, 3)): 4} 
+7
language-agnostic algorithm graph-theory combinatorics
source share
2 answers

I am going to start with some assumptions regarding the problem:

  • The value of the connected component: consider the transformation of the graph, where each edge becomes non-directional. Two vertices are considered connected if there is a transformed (undirected) graph between them.
  • If there exists an (directed) edge (i, j) connecting two vertices i, j, there can still be another (directed) edge (j, i) starting from j.

C (k, n) will denote the binomial coefficient "n choose k".

Analysis

Let G i (N) be the subset i of the vertices of the graph G (N) (such subsets C (i, N)). G i (N) can determine the associated component of size i only if the following two conditions are true:

  • Edges starting at the vertices G i (N) end with an element G i (N): event e inGi
  • Edges starting at the vertices G \ G i (N) end with G \ G i (N): event e outGi

These two events (e inGi and e outGi ) are independent, because edges are selected independently during plotting.

In addition, in G i <there can be no more than one cycle (i.e., edges with the form (i, j), (j, k), ..., (l, m), (m, i)) . More than one cycle will lead to more than one connected component at the vertices i we are considering, since there is only one edge starting from a given vertex. This defines the event of e 2-cycles .

Thus, the total probability of having a related component of size i defined by a subset of G i ( event e i-cc ) is:

p (e i-cc ) = p (e inGi ) .p (e 2-cycles | e inGi ). p (e outGi )

The presence of a connected component of size at least i is equivalent to the presence of a connected component of size i, except this time that we remove the restrictions imposed by the e outGi event. This defines the event e minimum, I


Probability Calculation

The probability of having at least two cycles in an isolated set of K vertices is:

2cycles

Proof: To define two cycles, we need two sets of vertices containing at least two vertices in each. Let k 1 and k 2 be the number of vertices in these sets.

Defining a cycle on a set of vertices is equivalent to choosing a circular order and linking the vertices. With a regular order, for a set 1 (respectively 2) defining a circular order, there are k 1 (respectively k 2 ) other orders that lead to the same cyclic order (they belong to the same equivalence class). There is so (k 1 -1)! (respectively k 2 -1)!) cyclic orders for set 1 (respectively 2). This statement explains (X -1)! some formula.

We take i elements among K, which we consider (i> 3). We are interested in building two cycles with these elements i (regardless of the remaining elements K - i). Since we ignore the rest, we must use the balance factor to compensate for the fact that we are going to read the same cycles several times (hence the bit 1 + C(i, Ki) ). We divide these elements i into two sets (hence the sum j=2..i/2 ). We take j elements among i. There are no remaining elements in i elements. However, in the case when j = ij, we will count exactly every pair of cycles twice, so we balance using the Kronecker symbol ( delta(i,j) = 1 iif (i = j) )

We have two sets, now we count the number of cycles ((k 1 -1)!. K 1 -1)! using k 1 = j and k 2 = ij) and divide them by the number of possible ways to construct edges at the vertices i ( (K-1)**i )

Note. It would be advisable to obtain an expert assessment of this probability.


Now we can calculate the probability of the presence of a connected component of power K in a graph of N vertices and edges:

pcceq

Note: the first term denotes a selection of K vertices. The odd part 1 + C(...) is a balance factor that takes into account the fact that if there is a K-combination X, X will appear on the β€œNK” side when considering some other K-combinations.

The next two terms represent the mutual isolation between the K selected vertices and the remaining NK vertices of the graph.


In the same way, the probability of having a connected component of size at least K is the same as having a connected component of size K, except that we remove the term corresponding to the e outGi event described earlier:

pccsup


So far, so good. The fact is that there can be more than one related component in a graph! So we cannot just average the formula above ...

The probability of having a maximum connected component of size i can be expressed using the probabilities p k (N): this is a combination of the following events:

  • Having a related component of cardinality I
  • The absence of a connected power component of at least i + 1 in the Ni remaining vertices.

These two events are independent, because the connected component of power i does not intersect with the rest of the graph.

Thus, the probability that the maximum related component has size K is:

p max (K, N) = p CCeq (K, N). (1 - p CCsup (K + 1, NK))

It remains only to average this probability. The expected size S (N) of the largest connected component in such a random graph with N vertices and edges:

S (N) = Ξ£ 2≀K≀N Kp max (K, N)


Quick example

Let N = 5. We have:

  • p max (1,5) = 0
  • p max (2,5) = 0
  • p max (3,5) = C (3,5). (1/2) 3 . (1/4) 2 = 10 / (2,4 3 ) = 80 / (4 5 )
  • p max (4,5) = 0
  • p max (5.5) = (1 - p 2 cycles (5)) = 1 - 80 / (4 5 )

Thus, p max (k, 5) is always 0, with the exception of k = 3 and k = 5. The cycle formula gives 20 pairs of 2- / 3 cycle cycles according to 4 possible schemes 5 and 15 pairs of 2- / 2- cycle 4 possible schemes 4 . Thus, the sum is 80 / (4 5 ), which is exactly equal to p max (3,5).


I suppose you can now use a computer to get real values ​​from this formula? :)

Disclaimer: I checked the 2- cycle formula for N = 3,4,5 manually, and it seems to hold well. However, an external review will be very helpful. Please note that these calculations were made based on the assumptions found at the beginning of my answer.

Again, thanks @DavidEisenstat for his helpful comment that led to this more rigorous (and hopefully correct) answer.

0
source

This is not a solution, but some observations.

First, with n edges and n vertices, and each vertex is connected to some other vertex, except for itself, any strongly connected component of size k in this graph must have exactly k edges. It follows from

  • any SCC with size k must have at least k-1 edges. In this graph, it is impossible for any SCC to have exactly k-1 edges, because if in this case there must be a vertex in the SCC that is connected to another vertex, and not in the SCC, then the assumption that this SCC has size is invalid k.
  • In this graph, it is also impossible for any SCC to have more than k + 1 edges, since each vertex is connected to exactly one other vertex.

Now suppose we want to calculate the total number of admissible graphs constructed in this question, and dp[n][k] is the total number of admissible graphs with n vertices and a maximum SCC of size k. Then dp[n][k] is

  • Somehow we select k vertices from n vertices and form SCC, and
  • The remaining nk vertices should not have an SCC larger than k.

Therefore, the wording DP becomes:

 dp[n][k]=C(n,k)*dp[nk][k] + C(n,k)*dp[nk][k-1] + C(n,k)*dp[nk][k-2] + ... C(n,k)*dp[nk][2] + C(n,k)*dp[nk][1] 

where C(n,k) n!/[(nk)!k!]

(I'm not sure if this is correct, but hopefully in the right direction)

The total number of allowed graphs is n vertices (n-1)**(n-1) . The expected size of the largest SCC can be designed according to the basic waiting principle.

0
source

All Articles