Sort matrix (or data.frame) by the number of unique values ​​per column

How to reorder columns of data.frame of the total number of unique values ​​for a column? As an example:

 var1 var2 var3 1 1 1 0 2 2 1 3 3 0 4 1 1 5 2 

Is there a way to automatically change the order of names, for example var2, var3, var1 (because the length of the unique values ​​is 5, 3 and 2, respectively, or vice versa, 2 3 5)?

In this case, it is not so difficult to get what we want, but in my case I have many columns. Is there a way to do this type of sorting automatically?

Also, I would prefer to have a solution that works on matrix (in addition to data.frame ), regardless of whether there are column names or not.

+4
source share
2 answers

Something like that?

 df[names(sort(sapply(df, function(x) length(unique(x))), decreasing = TRUE))] # var2 var3 var1 # 1 1 1 1 # 2 2 2 0 # 3 3 3 1 # 4 4 1 0 # 5 5 2 1 

If your input is matrix , then:

 m[, names(sort(apply(m, 2, function(x) length(unique(x))), decreasing = TRUE))] 

must work.

 # var2 var3 var1 # [1,] 1 1 1 # [2,] 2 2 0 # [3,] 3 3 1 # [4,] 4 1 0 # [5,] 5 2 1 

Edit: your example in the post seems to have column names, but this one that you gave in your comments does not work. Do not forget to give an example.

 X <- cbind(1, rnorm(10), 1:10) 

Since you cannot expect column names, you will need to return the indexes. Try this (this will work if you have column names or not, of course):

 m[, sort(apply(X, 2, function(x) length(unique(x))), decreasing = TRUE, index.return = TRUE)$ix] 
+7
source

Another solution using order ,

 dat[,order(apply(dat,2,function(x) length(unique(x))),decreasing = TRUE)] var2 var3 var1 1 1 1 1 2 2 2 0 3 3 3 1 4 4 1 0 5 5 2 1 

Now, if we put the deleted names, we get a good result, but with a warning

  colnames(dat) <- NULL dat[,order(apply(dat,2,function(x) length(unique(x))),decreasing = TRUE)] NA NA NA 1 1 1 1 2 2 2 0 3 3 3 1 4 4 1 0 5 5 2 1 

EDIT :

I am testing a matrix with 1000 columns. 2 time solutions are comparable with a small gain for order .

 X <- matrix(rnorm(100*1000),ncol=1000,nrow=100) Arun <- function() X[, sort(apply(X, 2, function(x) length(unique(x))), decreasing = TRUE, index.return = TRUE)$ix] AgStudy <- function() X[,order(apply(X,2,function(x) length(unique(x))),decreasing = TRUE)] library(microbenchmark) microbenchmark(Arun(),AgStudy()) Unit: milliseconds expr min lq median uq max 1 AgStudy() 28.04634 32.37105 34.73820 36.49930 129.6048 2 Arun() 31.15476 32.97180 36.24027 37.91584 132.3871 
+5
source

All Articles