R sums unique column values based on values from a single column

Question

R sums unique column values based on values from a single column

I want to know the total number of unique values for each column based on var_1 values.

For instance:

Test <- data.frame(var_1 = c("a","a","a", "b", "b", "c", "c", "c", "c", "c"), var_2 = c("bl","bf","bl", "bl","bf","bl","bl","bf","bc", "bg" ), var_3 = c("cf","cf","eg", "cf","cf","eg","cf","dr","eg","fg"))

The results I'm looking for will be based on the values in var_1 and should be:

var_1 var_2 var_3
a     2     2
b     2     1
c     3     4

However, after trying to use various methods (including apply and table), the aggregate was closest to what I'm looking for, but this script summarizes the total number of records for each var_1 value, but the total is not unique

agbyv1= aggregate(. ~ var_1, Test, length) 

var_1 var_2 var_3
a     3     3
b     2     2
c     5     5

I tried

unqbyv1= aggregate(. ~ var_1, Test, length(unique(x)))

but it didn’t work.

Any help is greatly appreciated.

+4

r aggregate unique

Ina.Quest May 05 '15 at 18:48

source share

2 answers

:

apply(Test[-1] , 2 , function(y) tapply(y,Test$var_1,function(x) length(unique(x))))

0

Eric Brooks 05 '15 18:54

akrun · Accepted Answer · 2015-05-05T18:50:56+0000

Try

library(dplyr)
Test %>%
      group_by(var_1) %>% 
      summarise_each(funs(n_distinct(.)))

or

library(data.table)#v1.9.5+
setDT(Test)[, lapply(.SD, uniqueN), var_1]

If there is NA

setDT(Test)[, lapply(.SD, function(x) uniqueN(na.omit(x))), var_1]

Or you can use aggregate. Default na.action=na.omit. Therefore, we do not need any changes.

aggregate(.~ var_1, Test, FUN=function(x) length(unique(x)) )

R sums unique column values ​​based on values ​​from a single column

More articles:

R sums unique column values based on values from a single column