Error with custom aggregation function to call cast () in R reshape2

I want to use R to sum the numeric data in a table with unique growth names in the result table with unique row names with values ​​summed using a custom function. Summing logic: use the average value if the ratio of the maximum to the minimum value is <1.5, otherwise use the median. Since the table is very large, I am trying to use the melt () and cast () functions in the reshape2 package.

  # example table with non-unique row-names
 tab <- data.frame (gene = rep (letters [1: 3], each = 3), s1 = runif (9), s2 = runif (9))
 # melt
 tab.melt <- melt (tab, id = 1)
 # function to summarize with logic: mean if max / min <1.5, else median
 summarize <- function (x) {ifelse (max (x) / min (x) <1.5, mean (x), median (x))}
 # cast with summarized values
 dcast (tab.melt, gene ~ variable, summarize) 

An error message appears in the last line of code.

  Error in vapply (indices, fun, .default): 
   values ​​must be type 'logical',
  but FUN (X [[1]]) result is type 'double'
 In addition: Warning messages:
 1: In max (x): no non-missing arguments to max;  returning -Inf
 2: In min (x): no non-missing arguments to min;  returning inf

What am I doing wrong? Please note that if the summation function was to simply return min () or max (), there was no error, although there is a warning message about "missing arguments". Thanks for any suggestion.

(The actual table I want to work with is 200x10000).

+8
casting r aggregate reshape2 reshape
source share
2 answers

Short answer: indicate the value to fill as follows acast (tab.melt, gene variable, summarize, fill = 0)

Long answer: It seems your function is wrapped like this before passing it to the vaggregate function (dcast calls cast, which calls vaggregate, which calls vapply):

fun <- function(i) { if (length(i) == 0) return(.default) .fun(.value[i], ...) } 

To find out what .default should be, this code is executed

 if (is.null(.default)) { .default <- .fun(.value[0]) } 

that is ..value [0] is passed to the function. min (x) or max (x) returns Inf or -Inf when x is numeric (0). However max (x) / min (x) returns NaN, which has a logical class. So when is vapply executed

 vapply(indices, fun, .default) 

with the default value, the class is logical (used as a template by vapply), the function does not work when it starts to return doubles.

+9
source share

dcast () is trying to set the value of the missing combination to the default.

you can specify this with the fill argument, but if fill = NULL, then the value returned by the fun (vector with wavelength 0) (i.e., summation (numeric value (0)) here) is used by default.

cm.? dcast

then here is a workaround:

  dcast(tab.melt, gene~variable, summarize, fill=NaN) 
+2
source share

All Articles