How to apply wilcox.test to a whole data frame in R?

I have a data frame with one grouping factor (first column) with several levels (more than two) and several data columns. I want to apply wilcox.test to the entire date frame to compare the variables of each group with the others. How can i do this?

UPDATE: I know that wilcox.test will only check the difference between two groups, and my data frame contains three. But I'm more interested in how to do this than those that need to be used. Most likely, one group will be deleted, but I have not decided yet that I want to check all the options.

Here is an example:

structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), var1 = c(9.3, 9.05, 7.78, 7.11, 7.14, 8.12, 7.5, 7.84, 7.8, 7.52, 8.84, 6.98, 6.1, 6.89, 6.5, 7.5, 7.8, 5.5, 6.61, 7.65, 7.68), var2 = c(11L, 11L, 10L, 1L, 3L, 7L, 11L, 11L, 11L, 11L, 4L, 1L, 1L, 1L, 2L, 2L, 1L, 4L, 8L, 8L, 1L), var3 = c(7L, 11L, 3L, 7L, 11L, 2L, 11L, 5L, 11L, 11L, 5L, 11L, 11L, 2L, 9L, 9L, 3L, 8L, 11L, 11L, 2L), var4 = c(11L, 11L, 11L, 11L, 6L, 11L, 11L, 11L, 10L, 7L, 11L, 2L, 11L, 3L, 11L, 11L, 6L, 11L, 1L, 11L, 11L), var5 = c(11L, 1L, 2L, 2L, 11L, 11L, 1L, 10L, 2L, 11L, 1L, 3L, 11L, 11L, 8L, 8L, 11L, 11L, 11L, 2L, 9L)), .Names = c("group", "var1", "var2", "var3", "var4", "var5"), class = "data.frame", row.names = c(NA, -21L)) 

UPDATE

Thanks everyone for the answers!

+7
r statistics
source share
3 answers

Updating my answer to work on columns

 test.fun <- function(dat, col) { c1 <- combn(unique(dat$group),2) sigs <- list() for(i in 1:ncol(c1)) { sigs[[i]] <- wilcox.test( dat[dat$group == c1[1,i],col], dat[dat$group == c1[2,i],col] ) } names(sigs) <- paste("Group",c1[1,],"by Group",c1[2,]) tests <- data.frame(Test=names(sigs), W=unlist(lapply(sigs,function(x) x$statistic)), p=unlist(lapply(sigs,function(x) x$p.value)),row.names=NULL) return(tests) } tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x)) names(tests) <- colnames(dat)[-1] # tests <- do.call(rbind, tests) reprints as data.frame # This solution is not "slow" and outperforms the other answers significantly: system.time( rep( tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x)),10000 ) ) # user system elapsed # 0.056 0.000 0.053 

And the result:

 tests $var1 Test W p 1 Group 1 by Group 2 28 0.36596737 2 Group 1 by Group 3 39 0.05927406 3 Group 2 by Group 3 38 0.27073136 $var2 Test W p 1 Group 1 by Group 2 19.0 0.8205958 2 Group 1 by Group 3 36.5 0.1159945 3 Group 2 by Group 3 40.5 0.1522726 $var3 Test W p 1 Group 1 by Group 2 13.0 0.2425786 2 Group 1 by Group 3 23.5 1.0000000 3 Group 2 by Group 3 41.0 0.1261647 $var4 Test W p 1 Group 1 by Group 2 26 0.4323470 2 Group 1 by Group 3 30 0.3729664 3 Group 2 by Group 3 29 0.9479518 $var5 Test W p 1 Group 1 by Group 2 24.0 0.7100968 2 Group 1 by Group 3 19.0 0.5324295 3 Group 2 by Group 3 17.5 0.2306609 
+3
source share

The pairwise.wilcox.test function seems useful to her; maybe so?

 out <- lapply(2:6, function(x) pairwise.wilcox.test(d[[x]], d$group)) names(out) <- names(d)[2:6] out 

If you just want p-values, you can go through and extract them and create a matrix.

 sapply(out, function(x) { p <- x$p.value n <- outer(rownames(p), colnames(p), paste, sep='v') p <- as.vector(p) names(p) <- n p }) ## var1 var2 var3 var4 var5 ## 2v1 0.5414627 0.8205958 0.4851572 1 1.0000000 ## 3v1 0.1778222 0.3479835 1.0000000 1 1.0000000 ## 2v2 NA NA NA NA NA ## 3v2 0.5414627 0.3479835 0.3784941 1 0.6919826 

Also note that pairwise.wilcox.test customizable for several comparisons using the Hill method; if you prefer to do something else, look at the p.adjust parameter.

+7
source share

You can loop through columns with apply , and then pass the columns to any tag you want to use using an anonymous function, for example (if the data frame is called df ):

 apply(df[-1],2,function(x) kruskal.test(x,df$group)) 

Note. I used the Kruskal-Wallis test because it works on several groups. The above would work just as well using the Wilcoxon test if there were only two groups.

If you want to make Wilcoxon paired tests for all variables, here are two layers that will go through all columns and all pairs and return the results in a list:

 group.pairs <- combn(unique(df$group),2,simplify=FALSE) # this loops over the 2nd margin - the columns - of df and makes each column # available as x apply(df[-1], 2, function(x) # this loops over the list of group pairs and makes each such pair # available as an integer vector y lapply(group.pairs, function(y) wilcox.test(x[df$group %in% y],df$group[df$group %in% y]))) 
+5
source share

All Articles