I am trying to implement the above community discovery algorithm in Java, and although I have access to the C ++ code and the original document, I cannot get it to work at all. My main problem is that I do not understand the purpose of the code, that is, how the algorithm works. From a practical point of view, my code is stuck in what seems like an endless loop in mergeBestQ , the heap list seems to get bigger at each iteration (as I would expect from the code), but the topQ value always returns the same value.
The graph I'm testing is quite large (300,000 nodes, 650,000 edges). The source code that I use for my implementation is from the SNAP library ( https://github.com/snap-stanford/snap/blob/master/snap-core/cmty.cpp ). It would be great if someone could explain the intuition of the algorithm to me, it seems to initially establish that each node should be in its own community, and then write down the modularity value (regardless of what it is) of each pair of connected nodes on the graph, then finding a pair of nodes that have the highest modularity and move them to the same community. Also, if someone could provide some mid-level pseudo code, that would be great. Here is my implementation so far, I tried to save it in one file for the sake of brevity, however CommunityGraph and CommunityNode are in a different place (not necessary). The graph maintains a list of all nodes, and each node maintains a list of its connections to other nodes. When launched, it never passes the while(this.mergeBestQ()){} line while(this.mergeBestQ()){}
UPDATE - found some errors in my code after a thorough analysis. The code now completes VERY quickly, but does not fully implement the algorithm, for example, 300,000 nodes on the graph; it claims to be about 299,000 communities (i.e., approximately 1 node per community). I have listed the updated code below. /// Clauset -Newman-Moore community discovery method. /// At each step, two communities are combined that contribute the maximum positive value to global modularity. /// See: Searching for community structures in very large networks, A. Clauset, MEJ Newman, C. Moore, 2004 open class CNMMCommunityMetric implements CommunityMetric {private static class DoubleIntInt implements Comparable {public double val1; public int val2; public int val3; DoubleIntInt (double val1, int val2, int val3) {this.val1 = val1; this.val2 = val2; this.val3 = val3; }
@Override public int compareTo(DoubleIntInt o) { //int this_sum = this.val2 + this.val3; //int oth_sum = o.val2 + o.val3; if(this.equals(o)){ return 0; } else if(val1 < o.val1 || (val1 == o.val1 && val2 < o.val2) || (val1 == o.val1 && val2 == o.val2 && val3 < o.val3)){ return 1; } else{ return -1; } //return this.val1 < o.val1 ? 1 : (this.val1 > o.val1 ? -1 : this_sum - oth_sum); } @Override public boolean equals(Object o){ return this.val2 == ((DoubleIntInt)o).val2 && this.val3 == ((DoubleIntInt)o).val3; } @Override public int hashCode() { int hash = 3; hash = 79 * hash + this.val2; hash = 79 * hash + this.val3; return hash; } } private static class CommunityData { double DegFrac; TIntDoubleHashMap nodeToQ = new TIntDoubleHashMap(); int maxQId; CommunityData(){ maxQId = -1; } CommunityData(double nodeDegFrac, int outDeg){ DegFrac = nodeDegFrac; maxQId = -1; } void addQ(int NId, double Q) { nodeToQ.put(NId, Q); if (maxQId == -1 || nodeToQ.get(maxQId) < Q) { maxQId = NId; } } void updateMaxQ() { maxQId=-1; int[] nodeIDs = nodeToQ.keys(); double maxQ = nodeToQ.get(maxQId); for(int i = 0; i < nodeIDs.length; i++){ int id = nodeIDs[i]; if(maxQId == -1 || maxQ < nodeToQ.get(id)){ maxQId = id; maxQ = nodeToQ.get(maxQId); } } } void delLink(int K) { int NId=getMxQNId(); nodeToQ.remove(K); if (NId == K) { updateMaxQ(); } } int getMxQNId() { return maxQId; } double getMxQ() { return nodeToQ.get(maxQId); } }; private TIntObjectHashMap<CommunityData> communityData = new TIntObjectHashMap<CommunityData>(); private TreeSet<DoubleIntInt> heap = new TreeSet<DoubleIntInt>(); private HashMap<DoubleIntInt,DoubleIntInt> set = new HashMap<DoubleIntInt,DoubleIntInt>(); private double Q = 0.0; private UnionFind uf = new UnionFind(); @Override public double getCommunities(CommunityGraph graph) { init(graph); //CNMMCommunityMetric metric = new CNMMCommunityMetric(); //metric.getCommunities(graph); // maximize modularity while (this.mergeBestQ(graph)) { } // reconstruct communities HashMap<Integer, ArrayList<Integer>> IdCmtyH = new HashMap<Integer, ArrayList<Integer>>(); Iterator<CommunityNode> ns = graph.getNodes(); int community = 0; TIntIntHashMap communities = new TIntIntHashMap(); while(ns.hasNext()){ CommunityNode n = ns.next(); int r = uf.find(n); if(!communities.contains(r)){ communities.put(r, community++); } n.setCommunity(communities.get(r)); } System.exit(0); return this.Q; } private void init(Graph graph) { double M = 0.5/graph.getEdgesList().size(); Iterator<Node> ns = graph.getNodes(); while(ns.hasNext()){ Node n = ns.next(); uf.add(n); int edges = n.getEdgesList().size(); if(edges == 0){ continue; } CommunityData dat = new CommunityData(M * edges, edges); communityData.put(n.getId(), dat); Iterator<Edge> es = n.getConnections(); while(es.hasNext()){ Edge e = es.next(); Node dest = e.getStart() == n ? e.getEnd() : e.getStart(); double dstMod = 2 * M * (1.0 - edges * dest.getEdgesList().size() * M);//(1 / (2 * M)) - ((n.getEdgesList().size() * dest.getEdgesList().size()) / ((2 * M) * (2 * M)));// * (1.0 - edges * dest.getEdgesList().size() * M); dat.addQ(dest.getId(), dstMod); } Q += -1.0 * (edges*M) * (edges*M); if(n.getId() < dat.getMxQNId()){ addToHeap(createEdge(dat.getMxQ(), n.getId(), dat.getMxQNId())); } } } void addToHeap(DoubleIntInt o){ heap.add(o); } DoubleIntInt createEdge(double val1, int val2, int val3){ DoubleIntInt n = new DoubleIntInt(val1, val2, val3); if(set.containsKey(n)){ DoubleIntInt n1 = set.get(n); heap.remove(n1); if(n1.val1 < val1){ n1.val1 = val1; } n = n1; } else{ set.put(n, n); } return n; } void removeFromHeap(Collection<DoubleIntInt> col, DoubleIntInt o){ //set.remove(o); col.remove(o); } DoubleIntInt findMxQEdge() { while (true) { if (heap.isEmpty()) { break; } DoubleIntInt topQ = heap.first(); removeFromHeap(heap, topQ); //heap.remove(topQ); if (!communityData.containsKey(topQ.val2) || ! communityData.containsKey(topQ.val3)) { continue; } if (topQ.val1 != communityData.get(topQ.val2).getMxQ() && topQ.val1 != communityData.get(topQ.val3).getMxQ()) { continue; } return topQ; } return new DoubleIntInt(-1.0, -1, -1); } boolean mergeBestQ(Graph graph) { DoubleIntInt topQ = findMxQEdge(); if (topQ.val1 <= 0.0) { return false; } // joint communities int i = topQ.val3; int j = topQ.val2; uf.union(i, j); Q += topQ.val1; CommunityData datJ = communityData.get(j); CommunityData datI = communityData.get(i); datI.delLink(j); datJ.delLink(i); int[] datJData = datJ.nodeToQ.keys(); for(int _k = 0; _k < datJData.length; _k++){ int k = datJData[_k]; CommunityData datK = communityData.get(k); double newQ = datJ.nodeToQ.get(k); //if(datJ.nodeToQ.containsKey(i)){ // newQ = datJ.nodeToQ.get(i); //} if (datI.nodeToQ.containsKey(k)) { newQ = newQ + datI.nodeToQ.get(k); datK.delLink(i); } // K connected to I and J else { newQ = newQ - 2 * datI.DegFrac * datK.DegFrac; } // K connected to J not I datJ.addQ(k, newQ); datK.addQ(j, newQ); addToHeap(createEdge(newQ, Math.min(j, k), Math.max(j, k))); } int[] datIData = datI.nodeToQ.keys(); for(int _k = 0; _k < datIData.length; _k++){ int k = datIData[_k]; if (!datJ.nodeToQ.containsKey(k)) { // K connected to I not J CommunityData datK = communityData.get(k); double newQ = datI.nodeToQ.get(k) - 2 * datJ.DegFrac * datK.DegFrac; datJ.addQ(k, newQ); datK.delLink(i); datK.addQ(j, newQ); addToHeap(createEdge(newQ, Math.min(j, k), Math.max(j, k))); } } datJ.DegFrac += datI.DegFrac; if (datJ.nodeToQ.isEmpty()) { communityData.remove(j); } // isolated community (done) communityData.remove(i); return true; } }
UPDATE: the current code is pretty fast and has half the memory usage compared to the โfastestโ solution, but only ~ 5% slower. the difference lies in using hashmap + treest vs priority queue and providing only one object for a given pair i, j exists at any time.