Why can't I use FUN when grouping with data.table

I ran into the following problem with data.table when aggregating using .SD and specifying lapply specifying the FUN argument because of explicitly. This is an unexpected behavior, or I just missed something. Why can't I explicitly indicate FUN in the foot? The following is a reproducible example.

require(data.table) dt <- as.data.table(iris) dt$Sepal.Length[sample(1:nrow(dt), 10)] <- NA dt[, lapply(.SD, function(x) sum(!is.na(x), na.rm=TRUE)), by = Species] Species Sepal.Length Sepal.Width Petal.Length Petal.Width 1: setosa 47 50 50 50 2: versicolor 46 50 50 50 3: virginica 47 50 50 50 dt[, lapply(.SD, FUN=function(x) sum(!is.na(x), na.rm=TRUE)), by = Species] Error in ..FUN(FUN = Sepal.Length) : unused argument(s) (FUN = Sepal.Length) 

Update:

Filed as error: # 4839 . (Arun fix now in version 1.8.9)

+7
r data.table lapply
source share
1 answer

I don’t think you are missing something. You should probably point out an error here when referring to this entry. Good catch!

This is because when you use lapply with .SD (in j ), data.table tries to find if there is a way to optimize overhead due to function calls, if possible. However, in this process, instead of calling a function:

 ..FUN(Sepal.Length) 

where ..FUN = function(x) sum(!is.na(x), na.rm=TRUE) , it becomes:

 ..FUN(FUN = Sepal.Length) 

Since the function does not have an argument called FUN , it returns an error. You can verify this by changing x to FUN in a function call:

 dt[, lapply(.SD, FUN=function(FUN) sum(!is.na(FUN), na.rm=TRUE)), by = Species] #    Species Sepal.Length Sepal.Width Petal.Length Petal.Width # 1:   setosa      49      50      50      50 # 2: versicolor      44      50      50      50 # 3:  virginica      47      50      50      50 

Inside: if you look at the function [.data.table , one way to fix this is to rewrite the line:

 txt <- as.list(jsub)[-1L] # [[1]] # .SD # $FUN <~~~~ this name FUN gets caught up in building the expression later # function(x) sum(!is.na(x), na.rm = TRUE) 

from:

 txt <- as.list(jsub)[-1L] names(txt)[2] <- "" # [[1]] # .SD # [[2]] # function(x) sum(!is.na(x), na.rm = TRUE) 

Starting CHECK completed successfully.

+7
source share

All Articles