Analysis of the social graph. 60 GB and 100 million nodes

Good evening,

I am trying to analyze the given data (edgelist or pajek format). The first thought was an R-project with the igraph package. But memory limitations (6 GB) will not do the trick. Can a 128GB PC handle data? Are there any alternatives that do not require a full schedule in RAM?

Thanks in advance.

PS: I found several programs, but I would like to hear an opinion about (yes, it's you) about this.

+5
source share
1 answer

If you only need degree distributions, you most likely do not need a graphics package. I recommend the bigtablulate package to

  • your R objects are stored in a file so that you are not limited by RAM
  • , foreach

-. , edgelist 1 1 .

set.seed(1)
N <- 1e6
M <- 1e6
edgelist <- cbind(sample(1:N,M,replace=TRUE),
                  sample(1:N,M,replace=TRUE))
colnames(edgelist) <- c("sender","receiver")
write.table(edgelist,file="edgelist-small.csv",sep=",",
            row.names=FALSE,col.names=FALSE)

10 , .

system("
for i in $(seq 1 10) 
do 
  cat edgelist-small.csv >> edgelist.csv 
done")

bigtabulate edgelist. read.big.matrix() ​​R.

library(bigtabulate)
x <- read.big.matrix("edgelist.csv", header = FALSE, 
                     type = "integer",sep = ",", 
                     backingfile = "edgelist.bin", 
                     descriptor = "edgelist.desc")
nrow(x)  # 1e7 as expected

, bigtable() .

outdegree <- bigtable(x,1)
head(outdegree)

, , :

# Check table worked as expected for first "node"
j <- as.numeric(names(outdegree[1]))  # get name of first node
all.equal(as.numeric(outdegree[1]),   # outdegree answer
          sum(x[,1]==j))              # manual outdegree count

, bigtable(x,2).

+6

All Articles