R tapply with null function

I am having trouble understanding the tapply function when the FUN argument is null .

The documentation says:

If FUN is NULL, tapply returns a vector that can be used to index a multi-channel array, which tapply normally returns.

For example, what does the following sample documentation do?

 ind <- list(c(1, 2, 2), c("A", "A", "B")) tapply(1:3, ind) #-> the split vector 

I do not understand the results:

 [1] 1 2 4 

Thanks.

+5
source share
1 answer

If you run tapply with the specified function (not NULL), say sum , as in the help, you will see that the result is a two-dimensional array with NA in one cell:

 res <- tapply(1:3, ind, sum) res AB 1 1 NA 2 2 3 

This means that one combination of factors, namely (1, B), is absent. When FUN is NULL, it returns vector indices corresponding to all combinations of factors. To check this:

 > which(!is.na(res)) [1] 1 2 4 

It should be noted that this function can return NA, as in the following toy example:

 > f <- function(x){ if(x[[1]] == 1) return(NA) return(sum(x)) } > tapply(1:3, ind, f) AB 1 NA NA 2 2 3 

Thus, in the general case, NA does not mean that the factor combination is absent.

+3
source

All Articles