Frequency table from multiple col and multiple rows in R

Question

Frequency table from multiple col and multiple rows in R

I am trying to get the frequency table from this data frame:

tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L),
                       a3 = c(0L, 1L, 0L), b1 = c(1L, 0L, 1L),
                       b2 = c(1L, 0L, 0L), b3 = c(0L, 1L, 1L)),
                       .Names = c("a1", "a2", "a3", "b1", "b2", "b3"),
                       class = "data.frame", row.names = c(NA, -3L))


tmp2 <- read.csv("tmp2.csv", sep=";")
tmp2
> tmp2
  a1 a2 a3 b1 b2 b3
1  1  1  0  1  1  0
2  0  0  1  0  0  1
3  0  1  0  1  0  1

I am trying to get the frequency table as follows:

table(tmp2[,1:3], tmp2[,4:6])

But I get:

Error in sort.list (y): 'x' must be atomic for 'sort.list'
Have you called "sort" in the list?

Expected Result:

Info: There is no need for a square matrix, for example, I would have to add b4 b5 and save a1 a2 a3

+4

r frequency

S12000 Apr 13 '16 at 10:03

source share

3 answers

Here a solution is possible:

aIdxs <- 1:3
bIdxs <- 4:7

# init matrix
m <- matrix(0,
            nrow = length(aIdxs), ncol=length(bIdxs),
            dimnames = list(colnames(tmp2)[aIdxs],colnames(tmp2)[bIdxs]))

# create all combinations of a and b column indexes
idxs <- expand.grid(aIdxs,bIdxs)

# for each line and for each combination we add 1
# to the matrix if both a and b column are 1 
for(r in 1:nrow(tmp2)){
  m <- m + matrix(apply(idxs,1,function(x){ all(tmp2[r,x]==1) }),
                  nrow=length(aIdxs), byrow=FALSE)
}
> m
   b1 b2 b3
a1  1  1  0
a2  2  1  1
a3  0  0  1

+1

digEmAll Apr 13 '16 at 11:00

source share

Another possible solution here. Your input is a bit more complicated for the “table”, because by default you have two sets “a” and “b” with binary indicators in each row indicating pairwise instances only between “a” and “b”, and you want to iterate over them, Below is a generalized (but maybe not very elegant) function that will work with different lengths a and b:

tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L), a3 = c(0L, 
                                                              1L, 0L), b1 = c(1L, 0L, 1L), b2 = c(1L, 0L, 0L), b3 = c(0L, 1L, 
                                                                                                                      1L)), .Names = c("a1", "a2", "a3", "b1", "b2", "b3"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                -3L))                                                                                                                                                                                                               
fun = function(x) t(do.call("cbind", lapply(x[,grep("a", colnames(x))], 
    function(p) rowSums(do.call("rbind", lapply(x[,grep("b", colnames(x))], 
    function(q) q*p ))))))
fun(tmp2)
#> fun(tmp2)
#   b1 b2 b3
#a1  1  1  0
#a2  2  1  1
#a3  0  0  1

# let do a bigger example
set.seed(1)
m = matrix(rbinom(size=1, n=50, prob=0.75), ncol=10, dimnames=list(paste("instance_", 1:5, sep=""), c(paste("a",1:4,sep=""), paste("b",1:6,sep=""))))

# Notice that the count of possible a and b elements are not equal
#> m
#           a1 a2 a3 a4 b1 b2 b3 b4 b5 b6
#instance_1  1  0  1  1  0  1  1  1  0  0
#instance_2  1  0  1  1  1  1  1  0  1  1
#instance_3  1  1  1  0  1  1  1  1  0  1
#instance_4  0  1  1  1  1  0  1  1  1  1
#instance_5  1  1  0  0  1  1  0  1  1  1

fun(as.data.frame(m))
#> fun(as.data.frame(m))
#   b1 b2 b3 b4 b5 b6
#a1  3  4  3  3  2  3
#a2  3  2  2  3  2  3
#a3  3  3  4  3  2  3
#a4  2  2  3  2  2  2

0

Teemu daniel laajala Apr 13 '16 at 11:30

source share

nicola · Accepted Answer · 2016-04-13T11:17:42+0000

Option:

matrix(colSums(tmp2[,rep(1:3,3)] & tmp2[,rep(4:6,each=3)]),
       ncol=3,nrow=3,
       dimnames=list(colnames(tmp2)[1:3],colnames(tmp2)[4:6]))
#   b1 b2 b3
#a1  1  1  0
#a2  2  1  1
#a3  0  0  1

If you have a different number of columns aand b, you can try:

acols<-1:3 #state the indices of the a columns
bcols<-4:6 #same for b; if you add a column this should be 4:7
matrix(colSums(tmp2[,rep(acols,length(bcols))] & tmp2[,rep(bcols,each=length(acols))]),
           ncol=length(bcols),nrow=length(acols),
           dimnames=list(colnames(tmp2)[acols],colnames(tmp2)[bcols]))

Frequency table from multiple col and multiple rows in R

More articles: