Efficient bipartite graph projection in python (using networkx)

Using the networkx module, I do some network analysis in Python 3.2, where I need to project a bipartite graph (of prisoners associated with their cell: input graph B in the code below) onto a subgraph (linking cellmates to each other if both have overlapping spells in one and in the same cell: using the input of the given nodes defining the nodal nodes of graph B, generating the output graph G). I do not need a special algorithm that is suitable for any or optimal match, I just need to collect all the links that satisfy certain conditions. Thus, the other SO posts I found do not really apply. But:

My current code explodes (RAM-, swap- and CPU-wise) as I give it more and more data. Please let me know if you see ways to arrange the code below using 5 layers of loops. I'm not sure if knowledge of networkx is necessary or the details of my edge attribute labeling are relevant. Thanks!

def time_overlap_projected_graph_parallel(B, nodes): G=nx.MultiGraph() G.add_nodes_from((n,B.node[n]) for n in nodes) for u in nodes: unbrs = set(B[u]) nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - set([u]) for v in nbrs2: for mutual_cell in set(B[u]) & set(B[v]): for uspell in B.get_edge_data(u,mutual_cell).values(): ustart = uspell[1] uend = uspell[2] for vspell in B.get_edge_data(v,mutual_cell).values(): vstart = vspell[1] vend = vspell[2] if uend > vstart and vend > ustart: ostart = max(ustart,vstart) oend = min(uend,vend) olen = (oend-ostart+1)/86400 ocell = mutual_cell if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]): G.add_edges_from([(u,v,{0: olen,1: ostart,2: oend,3: ocell})]) return G 
+4
source share
4 answers

Now you can use bipartite graphics. how

 import networkx as nx from networkx.algorithms import bipartite B.add_nodes_from(inmates_list, bipartite=0) B.add_nodes_from(cells_list, bipartite=1) inmates = set(n for n,d in B.nodes(data=True) if d['bipartite']==0) cells = = set(B) - inmates G = bipartite.projected_graph(B, inmates) 

( http://networkx.lanl.gov/reference/algorithms.bipartite.html )

+2
source

Here is my welcome. Depending on the average prisoners per cell, this can increase productivity. If you have a better way to get cells (e.g. node properties?), Replace

 cells = [n for n in B.nodes() if n[0] not in nodes] 

with this (here I assume the nodes are a list of all prisoners).

 from itertools import combinations def time_overlap_projected_graph_parallel(B, nodes): G=nx.MultiGraph() G.add_nodes_from((n,B.node[n]) for n in nodes) cells = [n for n in B.nodes() if n[0] not in nodes] for cell in cells: for u,v in combinations(B[cell],2): for uspell in B.get_edge_data(u,cell).values(): ustart = uspell[1] uend = uspell[2] for vspell in B.get_edge_data(v,cell).values(): vstart = vspell[1] vend = vspell[2] if uend > vstart and vend > ustart: ostart = max(ustart,vstart) oend = min(uend,vend) olen = (oend-ostart+1)/86400 ocell = cell if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]): G.add_edge(u,v,{0: olen,1: ostart,2: oend,3: ocell}) return G 
+1
source

I am posting this answer to suggest a few improvements. I assume that your bipartite graph is not multigraphic, but regular nx.Graph . I changed B to bi and G to uni , as uppercase names are reserved for classes by convention. By the way, what if spells started and ended at the same time?

 def time_overlap_projected_graph_parallel(bi, nodes): uni = nx.MultiGraph() for u in nodes: #inmate uni.add_node(u) # do this to prevent iterating nodes twice u_adj = bi.adj[u] # bi.adj is a dict of dicts for (w, uw_attr) in u_adj.iteritems(): # cell w_adj = bi.adj[w] for (v, wv_attr) in w_adj.iteritems():#cellmate if v == u: continue elif uni.has_edge(u, v): # avoid computing twice continue for uspell in uw_attr.itervalues(): ustart = uspell[1] uend = uspell[2] for vspell in wv_attr.itervalues(): vstart = vspell[1] vend = vspell[2] if uend > vstart and vend > ustart: ostart = max(ustart, vstart) oend = min(uend, vend) olen = (oend - ostart + 1) / 86400 # this assumes floats or Python 3 otherwise will be 0 ocell = w # I would change this to uni.add_edge(u, v, length=olen, start=ostart, end=oend, cell=ocell) # or to uni.add_edge(u, v, spell=[olen, ostart, oend, ocell]) uni.add_edge(u, v, **{0: olen, 1: ostart, 2: oend, 3: ocell}) return uni 
+1
source

Consider a revised code that can do the same thing, but with iterators (although I also reviewed a few network-related functions / call methods --- but I still check that something is broken):

 def time_overlap_projected_graph_parallel(B, nodes): G=nx.MultiGraph() G.add_nodes_from(nodes) for u in G.nodes_iter():#inmate for w in B.neighbors_iter(u):#cell for v in B.neighbors_iter(w):#cellmate if v == u: continue for uspell in B[u][w].values(): ustart = uspell[1] uend = uspell[2] for vspell in B[v][w].values(): vstart = vspell[1] vend = vspell[2] if uend > vstart and vend > ustart: ostart = max(ustart,vstart) oend = min(uend,vend) olen = (oend-ostart+1)/86400 ocell = w if (v not in G[u] or ostart not in [ edict[1] for edict in G[u][v].values() ]): G.add_edges_from([(u,v,{0: olen,1: ostart,2: oend,3: ocell})]) return G 
0
source

All Articles