How to count factor occurrences in several columns, grouping by one column?

Question

How to count factor occurrences in several columns, grouping by one column?

I have a seemingly simple question, but I can’t figure out how to get exactly what I want.

My data is as follows:

Job C/C++ Java Python Student FALSE TRUE FALSE Developer TRUE TRUE TRUE Developer TRUE TRUE FALSE Sysadmin TRUE FALSE FALSE Student FALSE TRUE TRUE

I would like to group the column "Job" and count the number TRUE in each column. My desired result would look like this:

  Job C/C++ Java Python Student 0 2 1 Developer 2 2 1 Sysadmin 1 0 0

Any help would be greatly appreciated.

+6

r aggregate

user2145843 Mar 07 '13 at 19:47

source share

2 answers

Assuming your data.frame is called "temp", just use aggregate :

 aggregate(. ~ Job, temp, sum) # Job CC. Java Python # 1 Developer 2 2 1 # 2 Student 0 2 1 # 3 Sysadmin 1 0 0

The logic is that TRUE and FALSE correspond to the numeric values "1" and "0", so you can just use sum when aggregating.

And to add a "tidyverse" solution for completeness:

 library(tidyverse) temp %>% group_by(Job) %>% summarise_all(sum) # # A tibble: 3 x 4 # Job CC. Java Python # <chr> <int> <int> <int> # 1 Developer 2 2 1 # 2 Student 0 2 1 # 3 Sysadmin 1 0 0

Here is your data in a format that is easy to copy and paste. This was obtained using dput(your-actual-data-frame-name) and this is what you should use in the future when posting R questions to the stack overflow.

 temp <- structure(list(Job = c("Student", "Developer", "Developer", "Sysadmin", "Student"), CC. = c(FALSE, TRUE, TRUE, TRUE, FALSE), Java = c(TRUE, TRUE, TRUE, FALSE, TRUE), Python = c(FALSE, TRUE, FALSE, FALSE, TRUE)), .Names = c("Job", "CC.", "Java", "Python"), class = "data.frame", row.names = c(NA, -5L))

+9

A5C1D2H2I1M1N2O1R2T1 Mar 07 '13 at 19:49

source share

Arun · Accepted Answer · 2013-03-07T20:03:44+0000

Alternative solutions plyr and data.table :

data.table:

 require(data.table) tmp.dt <- data.table(temp, key="Job") tmp.dt[, lapply(.SD, sum), by=Job] # Job CC. Java Python # 1: Developer 2 2 1 # 2: Student 0 2 1 # 3: Sysadmin 1 0 0

plyr:

 require(plyr) ddply(temp, .(Job), function(x) colSums(x[-1])) # Job CC. Java Python # 1 Developer 2 2 1 # 2 Student 0 2 1 # 3 Sysadmin 1 0 0

Edit: If instead of TRUE / FALSE you should count the number of Newbie , and then:

With data.table:

 require(data.table) tmp.dt <- data.table(temp, key="Job") tmp.dt[, lapply(.SD, function(x) sum(x == "Newbie")), by=Job]

With plyr:

 require(plyr) ddply(temp, .(Job), function(x) colSums(x[-1] == "Newbie"))

How to count factor occurrences in several columns, grouping by one column?

More articles: