How to check if a column contains only identical elements in R?

Sample data:

x <- matrix(c("Stack","Stack","Stack", "Overflow","Overflow","wolfrevO"), nrow=3,ncol=2) 

How to check if x[,1] contains completely identical elements?

If x contains NA s, does this method apply?

thanks

+5
source share
6 answers

If you want to see which elements are duplicated and how many times you can use table .

 table(x[,1]) # Stack # 3 table(x[,2]) # Overflow wolfrevO # 2 1 

To find out if there is only one unique value in a column, use dim .

 dim(table(x[,1])) == 1 # [1] TRUE 
+1
source

You can compare the first value of the vector with the rest of the vector.

 all(x[-1, 1] == x[1, 1]) # [1] TRUE 

If NA values ​​are present, then this exact method is not applicable. However, it can be easily fixed using na.omit() . For instance -

 ## create a vector with an NA value x2 <- c(x[, 1], NA) ## standard check returns NA all(x2 == x2[1]) # [1] NA ## call na.omit() to remove, then compare all(na.omit(x2) == x2[1]) # [1] TRUE 

So, with your matrix x this last row will become

 all(na.omit(x[-1, 1]) == x[1, 1]) 
+3
source

You can use the duplicated function to do this:

if sum(!duplicated(x[,1]))==1 returns TRUE , the column contains all the same values.

 sum(!duplicated(x[,1]))==1 [1] TRUE sum(!duplicated(x[,2]))==1 [1] FALSE 

If x contains NA, this method will work in the sense that all NA columns return TRUE , and mixed columns return FALSE .

 x <- matrix(c(NA,NA,NA,"Overflow","Overflow",NA),nrow=3,ncol=2) sum(!duplicated(x[,2]))==1 [1] FALSE sum(!duplicated(x[,1]))==1 [1] TRUE 
+2
source

You count unique elements of a column:

 length(unique(x[,1]))==1 

works even if your data has NA.

To verify the use of each column:

 apply(x, 2, function(a) length(unique(a))==1) 
+2
source

I agree with @Richard Scriven for symbols, factors, etc. ( all(x[-1, 1] == x[1, 1]) ).

However, a more robust approach may be useful for comparing numerical values:

 all.same <- function (x) { abs(max(x) - min(x)) < 8.881784e-16 # the constant above is just .Machine$double.eps*4 } apply(x, 2, all.same) 
0
source

Comparison of the proposed methods:

 x <- rep(1, 1000) x[5] <- 0 microbenchmark::microbenchmark( all(duplicated(x)), length(unique(x)) == 1, dim(table(x)) == 1, all(x == x[1]), times = 1000) Unit: microseconds expr min lq mean median uq max neval cld all(duplicated(x)) 19.594 21.461 24.688356 22.861 24.727 74.646 1000 b length(unique(x)) == 1 21.461 23.793 26.972993 25.193 26.127 156.755 1000 b dim(table(x)) == 1 1067.422 1090.282 1144.309131 1123.872 1154.197 2072.795 1000 c all(x == x[1]) 3.267 4.199 4.629929 4.200 4.666 22.394 1000 a 

x is a column or row. Matrix , data.frame or the like, to check the correspondence of rows or columns can be done using

 all(apply(X, 1, function(x){all(x == x[1])})) 
0
source

All Articles