Using application features in SparkR

I am currently trying to implement some functions using sparkR version 1.5.1. I saw older versions (version 1.3) where people used the apply function in DataFrames, but it looks like it is no longer directly available. Example:

x = c(1,2) xDF_R = data.frame(x) colnames(xDF_R) = c("number") xDF_S = createDataFrame(sqlContext,xDF_R) 

Now I can use the sapply function in the data.frame object

 xDF_R$result = sapply(xDF_R$number, ppois, q=10) 

When I use the same logic in a DataFrame

 xDF_S$result = sapply(xDF_S$number, ppois, q=10) 

I get the error "Error in as.list.default (X): there is no method to cast this S4 class to vector"

Can I do this somehow?

+7
sparkr
source share
1 answer

This is possible with custom features in Spark 2.0 .

 wrapper = function(df){ + out = df + out$result = sapply(df$number, ppois, q=10) + return(out) + } > xDF_S2 = dapplyCollect(xDF_S, wrapper) > identical(xDF_S2, xDF_R) [1] TRUE 

Please note that you need such a wrapper function because you cannot pass additional arguments directly, but this may change in the future.

0
source share

All Articles