Read unique values

Let's say I have:

v = rep(c(1,2, 2, 2), 25) 

Now I want to count the number of times each unique value appears. unique(v) returns unique values, but not how many they are.

 > unique(v) [1] 1 2 

I want something that gives me

 length(v[v==1]) [1] 25 length(v[v==2]) [1] 75 

but as a more general one-line :) Something is close (but not quite) like this:

 #<doesn't work right> length(v[v==unique(v)]) 
+85
r count unique
Nov 18 '10 at 13:18
source share
10 answers

Perhaps a table is what you need?

 dummyData = rep(c(1,2, 2, 2), 25) table(dummyData) # dummyData # 1 2 # 25 75 ## or another presentation of the same data as.data.frame(table(dummyData)) # dummyData Freq # 1 1 25 # 2 2 75 
+116
Nov 18 '10 at 13:23
source share

This is a one line approach using aggregate .

 > aggregate(data.frame(count = v), list(value = v), length) value count 1 1 25 2 2 75 
+11
Sep 12 '14 at 20:09
source share

If you have several factors (= multidimensional data frame), you can use the dplyr package to calculate unique values ​​in each combination of factors:

 library("dplyr") data %>% group_by(factor1, factor2) %>% summarize(count=n()) 

It uses the pipe operator %>% to call entire methods in the data data frame.

+7
Sep 07 '15 at 19:08
source share

To get a non-dimensional integer vector containing the number of unique values, use c() .

 dummyData = rep(c(1, 2, 2, 2), 25) # Chase reproducible data c(table(dummyData)) # get un-dimensioned integer vector 1 2 25 75 str(c(table(dummyData)) ) # confirm structure Named int [1:2] 25 75 - attr(*, "names")= chr [1:2] "1" "2" 

This can be useful if you need to pass the number of unique values ​​to another function and is shorter and more idiomatic than t(as.data.frame(table(dummyData))[,2] posted in a comment on Chase's answer. Thanks to Ricardo Saporte that pointed me to this here .

+6
Mar 30 '13 at 22:48
source share
Function

table () is a good way, as suggested by Chase . If you are analyzing a large dataset, an alternative way is to use the .N function in a datatable package.

Make sure you have installed the data table package on

 install.packages("data.table") 

the code:

 # Import the data.table package library(data.table) # Generate a data table object, which draws a number 10^7 times # from 1 to 10 with replacement DT<-data.table(x=sample(1:10,1E7,TRUE)) # Count Frequency of each factor level DT[,.N,by=x] 
+5
Jan 17 '15 at 6:44
source share

If you need to specify the number of unique values ​​as an additional column in the data frame containing your values ​​(a column that can represent an example of the sample size, for example), plyr provides a neat way:

 data_frame <- data.frame(v = rep(c(1,2, 2, 2), 25)) library("plyr") data_frame <- ddply(data_frame, .(v), transform, n = length(v)) 
+3
May 8 '13 at 14:38
source share

If you want to run a unique data.frame file (for example, train.data), and also get counts (which can be used as weight in classifiers), you can do the following:

 unique.count = function(train.data, all.numeric=FALSE) { # first convert each row in the data.frame to a string train.data.str = apply(train.data, 1, function(x) paste(x, collapse=',')) # use table to index and count the strings train.data.str.t = table(train.data.str) # get the unique data string from the row.names train.data.str.uniq = row.names(train.data.str.t) weight = as.numeric(train.data.str.t) # convert the unique data string to data.frame if (all.numeric) { train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, function(x) as.numeric(unlist(strsplit(x, split=",")))))) } else { train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, function(x) unlist(strsplit(x, split=","))))) } names(train.data.uniq) = names(train.data) list(data=train.data.uniq, weight=weight) } 
0
Sep 12 '13 at 5:47 on
source share

This works for me. Take your vector v

length(summary(as.factor(v),maxsum=50000))

Comment: set maxsum large enough to capture the number of unique values

or with magrittr package

v %>% as.factor %>% summary(maxsum=50000) %>% length

0
Jul 04 '16 at 0:17
source share

The categorization and summary() values ​​will also be performed.

 > v = rep(as.factor(c(1,2, 2, 2)), 25) > summary(v) 1 2 25 75 
0
Sep 17 '17 at 2:06 on
source share
 count_unique_words <-function(wlist) { ucountlist = list() unamelist = c() for (i in wlist) { if (is.element(i, unamelist)) ucountlist[[i]] <- ucountlist[[i]] +1 else { listlen <- length(ucountlist) ucountlist[[i]] <- 1 unamelist <- c(unamelist, i) } } ucountlist } expt_counts <- count_unique_words(population) for(i in names(expt_counts)) cat(i, expt_counts[[i]], "\n") 
-one
May 22 '13 at 7:49
source share



All Articles