R sums unique column values ​​based on values ​​from a single column

I want to know the total number of unique values ​​for each column based on var_1 values.

For instance:

Test <- data.frame(var_1 = c("a","a","a", "b", "b", "c", "c", "c", "c", "c"), var_2 = c("bl","bf","bl", "bl","bf","bl","bl","bf","bc", "bg" ), var_3 = c("cf","cf","eg", "cf","cf","eg","cf","dr","eg","fg"))

The results I'm looking for will be based on the values ​​in var_1 and should be:

var_1 var_2 var_3
a     2     2
b     2     1
c     3     4

However, after trying to use various methods (including apply and table), the aggregate was closest to what I'm looking for, but this script summarizes the total number of records for each var_1 value, but the total is not unique

agbyv1= aggregate(. ~ var_1, Test, length) 

var_1 var_2 var_3
a     3     3
b     2     2
c     5     5

I tried

unqbyv1= aggregate(. ~ var_1, Test, length(unique(x)))

but it didn’t work.

Any help is greatly appreciated.

+4
source share
2 answers

Try

library(dplyr)
Test %>%
      group_by(var_1) %>% 
      summarise_each(funs(n_distinct(.)))

or

library(data.table)#v1.9.5+
setDT(Test)[, lapply(.SD, uniqueN), var_1]

If there is NA

setDT(Test)[, lapply(.SD, function(x) uniqueN(na.omit(x))), var_1]

Or you can use aggregate. Default na.action=na.omit. Therefore, we do not need any changes.

aggregate(.~ var_1, Test, FUN=function(x) length(unique(x)) )
+5

:

apply(Test[-1] , 2 , function(y) tapply(y,Test$var_1,function(x) length(unique(x))))
0

All Articles