Sort matrix (or data.frame) by the number of unique values per column

Question

Sort matrix (or data.frame) by the number of unique values per column

How to reorder columns of data.frame of the total number of unique values for a column? As an example:

 var1 var2 var3 1 1 1 0 2 2 1 3 3 0 4 1 1 5 2

Is there a way to automatically change the order of names, for example var2, var3, var1 (because the length of the unique values is 5, 3 and 2, respectively, or vice versa, 2 3 5)?

In this case, it is not so difficult to get what we want, but in my case I have many columns. Is there a way to do this type of sorting automatically?

Also, I would prefer to have a solution that works on matrix (in addition to data.frame ), regardless of whether there are column names or not.

+4

sorting matrix r order dataframe

PascalVKooten Mar 6 '13 at 11:04

source share

2 answers

Another solution using order ,

 dat[,order(apply(dat,2,function(x) length(unique(x))),decreasing = TRUE)] var2 var3 var1 1 1 1 1 2 2 2 0 3 3 3 1 4 4 1 0 5 5 2 1

Now, if we put the deleted names, we get a good result, but with a warning

  colnames(dat) <- NULL dat[,order(apply(dat,2,function(x) length(unique(x))),decreasing = TRUE)] NA NA NA 1 1 1 1 2 2 2 0 3 3 3 1 4 4 1 0 5 5 2 1

EDIT :

I am testing a matrix with 1000 columns. 2 time solutions are comparable with a small gain for order .

 X <- matrix(rnorm(100*1000),ncol=1000,nrow=100) Arun <- function() X[, sort(apply(X, 2, function(x) length(unique(x))), decreasing = TRUE, index.return = TRUE)$ix] AgStudy <- function() X[,order(apply(X,2,function(x) length(unique(x))),decreasing = TRUE)] library(microbenchmark) microbenchmark(Arun(),AgStudy()) Unit: milliseconds expr min lq median uq max 1 AgStudy() 28.04634 32.37105 34.73820 36.49930 129.6048 2 Arun() 31.15476 32.97180 36.24027 37.91584 132.3871

+5

agstudy Mar 6 '13 at 11:33

source share

Arun · Accepted Answer · 2013-03-06T11:07:20+0000

Something like that?

 df[names(sort(sapply(df, function(x) length(unique(x))), decreasing = TRUE))] # var2 var3 var1 # 1 1 1 1 # 2 2 2 0 # 3 3 3 1 # 4 4 1 0 # 5 5 2 1

If your input is matrix , then:

 m[, names(sort(apply(m, 2, function(x) length(unique(x))), decreasing = TRUE))]

must work.

 # var2 var3 var1 # [1,] 1 1 1 # [2,] 2 2 0 # [3,] 3 3 1 # [4,] 4 1 0 # [5,] 5 2 1

Edit: your example in the post seems to have column names, but this one that you gave in your comments does not work. Do not forget to give an example.

 X <- cbind(1, rnorm(10), 1:10)

Since you cannot expect column names, you will need to return the indexes. Try this (this will work if you have column names or not, of course):

 m[, sort(apply(X, 2, function(x) length(unique(x))), decreasing = TRUE, index.return = TRUE)$ix]

Sort matrix (or data.frame) by the number of unique values ​​per column

More articles:

Sort matrix (or data.frame) by the number of unique values per column