Count the number of instances in which a variable or combination of variables is TRUE

I am an enthusiastic newbie R who needs help! :)

I have a data frame that looks like this:

id<-c(100,200,300,400) a<-c(1,1,0,1) b<-c(1,0,1,0) c<-c(0,0,1,1) y=data.frame(id=id,a=a,b=b,c=c) 

If the identifier is a unique identifier (for example, by a person), and a, b and c are dummy variables for whether the person has this function or not (as always 1 = TRUE).

I want R to create a matrix or data frame where I have a, b, and c variables, both column names and row names. For the values ​​of the matrix R, it will be necessary to calculate the number of identifiers having this function, or a combination of features.

So, for example, identifiers 100, 200 and 400 then have a function on the diagonal of the matrix, where a and a are cross, R will enter 3. Only identifier 100 has both functions a and b, so R will enter 1, where a and b, and etc.

The resulting data frame should look like this:

 l<-c("","a","b","c") m<-c("a",3,1,1) n<-c("b",1,2,1) o<-c("c",1,1,2) result<-matrix(c(l,m,n,o),nrow=4,ncol=4) 

Since my data set contains 10 variables and hundreds of observations, I will have to automate the whole process.

Your help will be greatly appreciated. Thank you very much!

+6
source share
2 answers

With base R:

 crossprod(as.matrix(y[,-1])) # abc # a 3 1 1 # b 1 2 1 # c 1 1 2 
+8
source

This is called an adjacency matrix. You can do this quite easily with the qdap package:

 library(qdap) adjmat(y[,-1])$adjacency ## abc ## a 3 1 1 ## b 1 2 1 ## c 1 1 2 

It gives a warning because you feed it with a file frame. It doesn’t matter, and they can be ignored. Also noticed that I deleted the first column (ID) with negative indexing y[, -1] .

Note that since you started with a boolean matrix, you could get it with:

 Y <- as.matrix(y[,-1]) t(Y) %*% Y 
+3
source

All Articles