Generation and Summation Matrix

I'm relatively new to R, so forgive me for what I consider a relatively simple question.

I have data in the form

1 2 3 4 5 A 0 1 1 0 0 B 1 0 1 0 1 C 0 1 0 1 0 D 1 0 0 0 0 E 0 0 0 0 1 

where AE are people, and 1-5 are binaries of whether they have this quality. I need to create an AE matrix, where cell A, B = 1 if the sum of any quality 1-5 for A and B sums to 2. (If they have at least one quality). A simple 5x5 would be:

  ABCDE A 1 B 1 1 C 1 0 1 D 0 1 0 1 E 0 1 0 0 1 

Then I need to sum the whole matrix. (Above will be 9). I have thousands of observations, so I can’t do it manually. I'm sure there are a few simple lines of code, I'm just not experienced enough.

Thanks!

EDIT: I imported data from a CSV file with columns (1-5 above) as variables, in real data I have 40 variables. AEs are unique observations of human observations around 2000. I would also like to know how to be the first to convert this to a matrix in order to fulfill the excellent answers that you have already provided. Thanks!

+8
matrix r
source share
3 answers

Here you can use matrix multiplication

 out <- tcrossprod(m) # ABCDE # A 2 1 1 0 0 # B 1 3 0 1 1 # C 1 0 2 0 0 # D 0 1 0 1 0 # E 0 1 0 0 1 

Then set the diagonal to one, if necessary

 diag(out) <- 1 

As DavidA points out in the tcrossprod comments, basically m %*% t(m)

Several ways to calculate them sum l here is one

 sum(out[upper.tri(out, diag=TRUE)] , na.rm=TRUE) 
+6
source share

You can use outer if m is your square matrix:

 f = Vectorize(function(u,v) any(colSums(m[c(u,v),])>1)+0L) res = outer(1:ncol(m), 1:ncol(m), FUN=f) colnames(res) = row.names(res) = rownames(m) # ABCDE #A 1 1 1 0 0 #B 1 1 0 1 1 #C 1 0 1 0 0 #D 0 1 0 1 0 #E 0 1 0 0 1 

Data:

 m = structure(c(0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1), .Dim = c(5L, 5L), .Dimnames = list(c("A", "B", "C", "D", "E"), NULL)) 
+1
source share

How about this? (not as elegant as tcrossprod solution):

 d <- dim(m) ind <- expand.grid(1:d[1],1:d[1]) M <- matrix(as.numeric(apply(cbind(m[ind[,2],],m[ind[,1]]), 1, + function(x) sum(x[1:d[1]] == 1 & x[(d[1]+1):(d[1]*2)] == 1) >=1)), ncol = d[1]) rownames(M) = colnames(M) = rownames(m) M ABCDE A 1 1 1 0 0 B 1 1 0 1 1 C 1 0 1 0 0 D 0 1 0 1 0 E 0 1 0 0 1 
+1
source share

All Articles