Community Discovery Using InfoMap Algorithm Producing One Massive Module

I use the InfoMap algorithm in the igraph package to perform community detection on a directional and non-weighted graph (34,943 vertices, 206,366 edges). In the graph, the vertices represent the sites, and the edges represent the presence of a hyperlink between the websites.

The problem that I encountered after running the algorithm is that most of the vertices have membership in one massive community (32920 or 94%). The remaining peaks are scattered in hundreds of other tiny communities.

I tried different settings with the nb.trials parameter (i.e. 50, 100 and now 500 works). However, this does not greatly change the result.

I feel pretty annoying because the execution time of the algorithm is quite long, so I have to wait every time for the results (no luck!).

Many thanks.

+8
r igraph sna
source share
2 answers

Thanks for all the great comments. In the end, I got it by downloading and running the source code for Infomap, which is available at http://www.mapequation.org/code.html .

Due to license issues, the latest code was not integrated with igraph .

This solved the problem of too many nodes that were "concentrated" in one massive community.

In particular, I used the following parameters on the command line: -N 10 --directed --two-level --map

Kudos to Martin Rosval from the Infomap project for providing me with detailed help to solve this problem.

For the interested reader, here is more information about this issue:

When a network crashes into one main cluster, this is most often due to a very dense and random link structure ... Teleportation is encoded in the code for directional networks implemented in iGraph. If many nodes do not have outgoing links, the teleportation effect can be significant, since it accidentally connects nodes. We created a new code here: http://www.mapequation.org/code.html , which can cluster a network without encoding random teleportation necessary to increase ergodicity of the dynamics. See this document for more details: http://pre.aps.org/abstract/PRE/v85/i5/e056107

+8
source share

I was going to put this in a comment, but in this format it was too long and hard to read, so this is a tangential answer.

One thing you should do is evaluate if the community structure search algorithm works well. You can try to visualize your network to set:

  • Is an algorithm that returns community structures meaningful? Maybe there is one mass community?
  • If not, then visualization gives an idea of โ€‹โ€‹why?

This will help you tell you the next steps. Maybe the network structure requires a different algorithm?

One thing that I find useful for large networks is to draw your edges as a heat map. This is easy to do if you have edges stored in the adjacency matrix.

To do this, you can use the image function by passing the argument z to your edge matrix. Hope this allows you to see the prominent community structure.

However, you also want to evaluate the correctness of your algorithm, so you want to sort the nodes (rows and columns of the adjacency matrix) by the community to which they were assigned.

Another thing to note is that if your ribs are pointed, it can be harder to evaluate with your eyes, since the edges may appear on either side of the diagonal of the heat map. One thing you can do is construct the underlying graph - this is an adjacency matrix, assuming your edges are not oriented.

If your algorithm does a good job, you expect to see square blocks diagonally, one for each discovered community.

+4
source share

All Articles