Read unique values

Question

Read unique values

Let's say I have:

v = rep(c(1,2, 2, 2), 25)

Now I want to count the number of times each unique value appears. unique(v) returns unique values, but not how many they are.

 > unique(v) [1] 1 2

I want something that gives me

 length(v[v==1]) [1] 25 length(v[v==2]) [1] 75

but as a more general one-line :) Something is close (but not quite) like this:

 #<doesn't work right> length(v[v==unique(v)])

+85

r count unique

gakera Nov 18 '10 at 13:18

source share

10 answers

This is a one line approach using aggregate .

 > aggregate(data.frame(count = v), list(value = v), length) value count 1 1 25 2 2 75

+11

SeaSprite Sep 12 '14 at 20:09

source share

If you have several factors (= multidimensional data frame), you can use the dplyr package to calculate unique values in each combination of factors:

 library("dplyr") data %>% group_by(factor1, factor2) %>% summarize(count=n())

It uses the pipe operator %>% to call entire methods in the data data frame.

+7

antoine Sep 07 '15 at 19:08

source share

To get a non-dimensional integer vector containing the number of unique values, use c() .

 dummyData = rep(c(1, 2, 2, 2), 25) # Chase reproducible data c(table(dummyData)) # get un-dimensioned integer vector 1 2 25 75 str(c(table(dummyData)) ) # confirm structure Named int [1:2] 25 75 - attr(*, "names")= chr [1:2] "1" "2"

This can be useful if you need to pass the number of unique values to another function and is shorter and more idiomatic than t(as.data.frame(table(dummyData))[,2] posted in a comment on Chase's answer. Thanks to Ricardo Saporte that pointed me to this here .

+6

Ben Mar 30 '13 at 22:48

source share

Function

table () is a good way, as suggested by Chase . If you are analyzing a large dataset, an alternative way is to use the .N function in a datatable package.

Make sure you have installed the data table package on

 install.packages("data.table")

the code:

 # Import the data.table package library(data.table) # Generate a data table object, which draws a number 10^7 times # from 1 to 10 with replacement DT<-data.table(x=sample(1:10,1E7,TRUE)) # Count Frequency of each factor level DT[,.N,by=x]

+5

C. Zeng Jan 17 '15 at 6:44

source share

If you need to specify the number of unique values as an additional column in the data frame containing your values (a column that can represent an example of the sample size, for example), plyr provides a neat way:

 data_frame <- data.frame(v = rep(c(1,2, 2, 2), 25)) library("plyr") data_frame <- ddply(data_frame, .(v), transform, n = length(v))

+3

lionel May 8 '13 at 14:38

source share

If you want to run a unique data.frame file (for example, train.data), and also get counts (which can be used as weight in classifiers), you can do the following:

 unique.count = function(train.data, all.numeric=FALSE) { # first convert each row in the data.frame to a string train.data.str = apply(train.data, 1, function(x) paste(x, collapse=',')) # use table to index and count the strings train.data.str.t = table(train.data.str) # get the unique data string from the row.names train.data.str.uniq = row.names(train.data.str.t) weight = as.numeric(train.data.str.t) # convert the unique data string to data.frame if (all.numeric) { train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, function(x) as.numeric(unlist(strsplit(x, split=",")))))) } else { train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, function(x) unlist(strsplit(x, split=","))))) } names(train.data.uniq) = names(train.data) list(data=train.data.uniq, weight=weight) }

0

user2771312 Sep 12 '13 at 5:47 on

source share

This works for me. Take your vector v

length(summary(as.factor(v),maxsum=50000))

Comment: set maxsum large enough to capture the number of unique values

or with magrittr package

v %>% as.factor %>% summary(maxsum=50000) %>% length

0

Anthony Ebert Jul 04 '16 at 0:17

source share

The categorization and summary() values will also be performed.

 > v = rep(as.factor(c(1,2, 2, 2)), 25) > summary(v) 1 2 25 75

0

sedeh Sep 17 '17 at 2:06 on

source share

 count_unique_words <-function(wlist) { ucountlist = list() unamelist = c() for (i in wlist) { if (is.element(i, unamelist)) ucountlist[[i]] <- ucountlist[[i]] +1 else { listlen <- length(ucountlist) ucountlist[[i]] <- 1 unamelist <- c(unamelist, i) } } ucountlist } expt_counts <- count_unique_words(population) for(i in names(expt_counts)) cat(i, expt_counts[[i]], "\n")

-one

Michael Wise May 22 '13 at 7:49

source share

Chase · Accepted Answer · 2010-11-18 13:23

Perhaps a table is what you need?

 dummyData = rep(c(1,2, 2, 2), 25) table(dummyData) # dummyData # 1 2 # 25 75 ## or another presentation of the same data as.data.frame(table(dummyData)) # dummyData Freq # 1 1 25 # 2 2 75

Read unique values

More articles: