Search for a loop of 3 nodes (or triangles) in a graph

I work with complex networks. I want to find a group of nodes that forms a cycle of 3 nodes (or triangles) in a given graph. Since my graph contains about a million edges, using a simple iterative solution (plural) for a loop is not very efficient.

I use python for my programming, if these are some built-in modules to solve these problems, please let me know.

If someone knows any algorithm that can be used to search for triangles in graphs, kindly answer.

+7
python graph cycle geometry
source share
11 answers

A million ribs are quite small. If you don’t do it a thousand times, just use a naive implementation.

I assume that you have a node_ids dictionary that indicates the sequence of their neighbors and that the graph is directed.

For example:

nodes = {} nodes[0] = 1,2 nodes[1] = tuple() # empty tuple nodes[2] = 1 

My decision:

 def generate_triangles(nodes): """Generate triangles. Weed out duplicates.""" visited_ids = set() # remember the nodes that we have tested already for node_a_id in nodes: for node_b_id in nodes[node_a_id]: if nod_b_id == node_a_id: raise ValueError # nodes shouldn't point to themselves if node_b_id in visited_ids: continue # we should have already found b->a->??->b for node_c_id in nodes[node_b_id]: if node_c_id in visited_ids: continue # we should have already found c->a->b->c if node_a_id in nodes[node_c_id]: yield(node_a_id, node_b_id, node_c_id) visited_ids.add(node_a_id) # don't search a - we already have all those cycles 

Performance Check:

 from random import randint n = 1000000 node_list = range(n) nodes = {} for node_id in node_list: node = tuple() for i in range(randint(0,10)): # add up to 10 neighbors try: neighbor_id = node_list[node_id+randint(-5,5)] # pick a nearby node except: continue if not neighbor_id in node: node = node + (neighbor_id,) nodes[node_id] = node cycles = list(generate_triangles(nodes)) print len(cycles) 

When I tried this, it took more time to build a random graph than counting loops.

You might want to test it;) I will not guarantee that it will be correct.

You can also look in networkx, which is a large python graph library.

+4
source share

I don't want to sound harsh, but have you tried Google to do this? The first link is a pretty quick algorithm for this: http://www.mail-archive.com/ algogeeks@googlegroups.com /msg05642.html

And here is this article about ACM (to which you can have access): http://portal.acm.org/citation.cfm?id=244866 (and if you do not have access, I am sure that if you kindly ask the lady who wrote it, you will get a copy.)

In addition, I can imagine a triangle enumeration method based on click-decomposition, but I do not know if this has been described somewhere.

+2
source share

A fairly simple and straightforward way is to use Networkx:

With Networkx, you can get the loops of the nx.cycle_basis (G) undirected graph, and then select those that have 3 nodes

 cycls_3 = [c for c in nx.cycle_basis(G) if len(c)==3] 

or you can find all clicks of find_cliques (G) and then select the ones you want (with 3 nodes). cliques are parts of a graph where all nodes are connected to each other, which happens in loops / loops with three nodes.

+2
source share

Assuming its undirected graph, the answer lies in the python networkx library. if you just need to count triangles, use:

 import networkx as nx tri=nx.triangles(g) 

But if you need to know the list of edges with a triangular (triad) relation, use

 all_cliques= nx.enumerate_all_cliques(g) 

This will give you all the clicks (k = 1,2,3 ... max degree - 1)

So, for filtering only triangles ie k = 3,

 triad_cliques=[x for x in all_cliques if len(x)==3 ] 

Triad_cliques will present a list of edges with only triangles.

+2
source share

Although this is not efficient, you might want to implement a solution, so use loops. Write a test to get an idea of ​​how long it will take.

Then, when you try to use new approaches, you can do two things: 1) Make sure the answer remains the same. 2) Look at what improvement is.

Having a faster algorithm that skips something is likely to be worse than a slower one.

After a slow test, you can see if you can do this in parallel and see what an increase in performance is.

Then you can see if you can mark all nodes that have less than 3 vertices.

Ideally, you can first cut to 100 or so that you can draw it and see what happens graphically.

Sometimes your brain will see a pattern that is not so obvious when considering algorithms.

+1
source share

I am working on the same problem of counting the number of triangles on an undirected graph, and a witty solution works fine in my case. I changed it a bit, so only undirected triangles are taken into account.

  #### function for counting undirected cycles def generate_triangles(nodes): visited_ids = set() # mark visited node for node_a_id in nodes: temp_visited = set() # to get undirected triangles for node_b_id in nodes[node_a_id]: if node_b_id == node_a_id: raise ValueError # to prevent self-loops, if your graph allows self-loops then you don't need this condition if node_b_id in visited_ids: continue for node_c_id in nodes[node_b_id]: if node_c_id in visited_ids: continue if node_c_id in temp_visited: continue if node_a_id in nodes[node_c_id]: yield(node_a_id, node_b_id, node_c_id) else: continue temp_visited.add(node_b_id) visited_ids.add(node_a_id) 

Of course you need to use a dictionary like

  #### Test cycles #### nodes = {} nodes[0] = [1, 2, 3] nodes[1] = [0, 2] nodes[2] = [0, 1, 3] nodes[3] = [1] cycles = list(generate_triangles(nodes)) print cycles 

Using the Vista code, the triangles found will be [(0, 1, 2), (0, 2, 1), (0, 3, 1), (1, 2, 3)]

which considered the triangle (0, 1, 2) and (0, 2, 1) to be two different triangles. With a modified code, they are considered only one triangle.

I used this with a relatively small dictionary of less than 100 keys, and each key has an average of 50 values.

+1
source share

Do you need to find "all" of the triangles or just "some" / "any"? Or maybe you just need to check if a particular node is part of a triangle?

The test is simple - given node A, are there any two connected nodes B and C that are also directly connected.

If you need to find all the triangles - in particular, all groups of three nodes in which each node is connected to the other two - then you need to check each possible group in a very long "for everyone" mode, loop.

The only optimization is that you do not check the same “group” twice, for example. if you have already verified that B and C are not in the group with A, then do not check if A and C are in the group with B.

0
source share

Surprised when not mentioning the function of Networkx triangles. I know that this does not necessarily return the groups of nodes that form the triangle, but they should be very important for many who are on this page.

 nx.triangles(G) # list of how many triangles each node is part of sum(nx.triangles(G).values())/3 # total number of triangles 

An alternative way to return node clusters would look like ...

 for u,v,d in G.edges(data=True): u_array = adj_m.getrow(u).nonzero()[1] # get lists of all adjacent nodes v_array = adj_m.getrow(v).nonzero()[1] # find the intersection of the two sets - these are the third node of the triangle np.intersect1d(v_array,u_array) 
0
source share

If you care about several copies of the same triangle in a different order, then a list of 3 tuples works:

 from itertools import combinations as combos [(n,nbr,nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2]] 

The logic here is to check each pair of neighbors of each node to see if they are connected. G[n] is a quick way to search or search for neighbors.

If you want to get rid of reordering, turn each triple into a freezonet and create a set of freezons:

 set(frozenset([n,nbr,nbr2]) for n in G for nbr, nbr2 in combos(G[n]) if nbr in G[nbr2]) 

If you do not like frozenset and you need a list of sets, then:

 triple_iter = ((n, nbr, nbr2) for n in G for nbr, nbr2 in combos(G[n],2) if nbr in G[nbr2]) triangles = set(frozenset(tri) for tri in triple_iter) nice_triangles = [set(tri) for tri in triangles] 
0
source share

This is a more efficient version of Ajay M answer (I would comment on this, but I don't have enough reputation).

Indeed, the enumerate_all_cliques networkx method will return all clicks on the graph, regardless of their length; therefore, looping through it can take a lot of time (especially with very tight graphs).

Moreover, once defined for triangles, it’s just a parameterization issue to generalize the method for each click length, so here is the function:

 import networkx as nx def get_cliques_by_length(G, length_clique): """ Return the list of all cliques in an undirected graph G with length equal to length_clique. """ cliques = [] for c in nx.enumerate_all_cliques(G) : if len(c) <= length_clique: if len(c) == length_clique: cliques.append(c) else: return cliques # return empty list if nothing is found return cliques 

To get triangles, just use get_cliques_by_length(G, 3) .

Caution : this method only works for undirected graphs. Algorithm for clicks in directed graphs is not specified in networkx

0
source share

I just found that nx.edge_disjoint_paths works to count a triangle containing specific edges. faster than nx.enumerate_all_cliques and nx.cycle_basis . It returns the disjoint paths of the edges between the source and target. Disjoint edge paths are paths that do not have a common edge.
And result-1 is the number of triangles that contain specific edges either between the source node and the target node.

 edge_triangle_dict = {} for i in g.edges: edge_triangle_dict[i] = len(list(nx.edge_disjoint_paths(g, i[0], i[1]))-1) 
0
source share

All Articles