How to count factor occurrences in several columns, grouping by one column?

I have a seemingly simple question, but I canโ€™t figure out how to get exactly what I want.

My data is as follows:

Job C/C++ Java Python Student FALSE TRUE FALSE Developer TRUE TRUE TRUE Developer TRUE TRUE FALSE Sysadmin TRUE FALSE FALSE Student FALSE TRUE TRUE 

I would like to group the column "Job" and count the number TRUE in each column. My desired result would look like this:

  Job C/C++ Java Python Student 0 2 1 Developer 2 2 1 Sysadmin 1 0 0 

Any help would be greatly appreciated.

+6
source share
2 answers

Alternative solutions plyr and data.table :

data.table:

 require(data.table) tmp.dt <- data.table(temp, key="Job") tmp.dt[, lapply(.SD, sum), by=Job] # Job CC. Java Python # 1: Developer 2 2 1 # 2: Student 0 2 1 # 3: Sysadmin 1 0 0 

plyr:

 require(plyr) ddply(temp, .(Job), function(x) colSums(x[-1])) # Job CC. Java Python # 1 Developer 2 2 1 # 2 Student 0 2 1 # 3 Sysadmin 1 0 0 

Edit: If instead of TRUE / FALSE you should count the number of Newbie , and then:

With data.table:

 require(data.table) tmp.dt <- data.table(temp, key="Job") tmp.dt[, lapply(.SD, function(x) sum(x == "Newbie")), by=Job] 

With plyr:

 require(plyr) ddply(temp, .(Job), function(x) colSums(x[-1] == "Newbie")) 
+7
source

Assuming your data.frame is called "temp", just use aggregate :

 aggregate(. ~ Job, temp, sum) # Job CC. Java Python # 1 Developer 2 2 1 # 2 Student 0 2 1 # 3 Sysadmin 1 0 0 

The logic is that TRUE and FALSE correspond to the numeric values โ€‹โ€‹"1" and "0", so you can just use sum when aggregating.


And to add a "tidyverse" solution for completeness:

 library(tidyverse) temp %>% group_by(Job) %>% summarise_all(sum) # # A tibble: 3 x 4 # Job CC. Java Python # <chr> <int> <int> <int> # 1 Developer 2 2 1 # 2 Student 0 2 1 # 3 Sysadmin 1 0 0 

Here is your data in a format that is easy to copy and paste. This was obtained using dput(your-actual-data-frame-name) and this is what you should use in the future when posting R questions to the stack overflow.

 temp <- structure(list(Job = c("Student", "Developer", "Developer", "Sysadmin", "Student"), CC. = c(FALSE, TRUE, TRUE, TRUE, FALSE), Java = c(TRUE, TRUE, TRUE, FALSE, TRUE), Python = c(FALSE, TRUE, FALSE, FALSE, TRUE)), .Names = c("Job", "CC.", "Java", "Python"), class = "data.frame", row.names = c(NA, -5L)) 
+9
source

All Articles