In R: Error in is.data.frame (data): object '' not found, plot C5.0

This question is similar to some other questions about Stackoverflow ( here , here and here ), but still enough so that I cannot extrapolate these answers to my case.

I have a function in which I approach the C5.0 model, rather than trying to build a model.

train_d <- globald[train_ind,c(features,21)] model <- C5.0(binclass ~ .,data=train_d,trials=10) 

binclass is the column name in my training / test data (globald is the data framework from which I multiply rows with _ind indices and c(3:12,21) columns c(3:12,21) , where column 21 is called binclass ). Installation works well. However, when I also add the line

 plot(model,trial=0) 

then I get the following error: Error in is.data.frame(data) : object 'train_d' not found .

How is it possible that train_d found and used correctly when fitting the model, but train_d nowhere to be found when plotting? And any suggestion on how to solve this problem. The namespaces in [r] remain a mystery to me.

An example of a minimal run is as follows:

 f <- function(){ library(C50) set.seed(1) class = c(1,2) d <- data.frame(feature1 = sample(1:10,10,replace=TRUE), feature2 = 1:10, binclass = class) d$binclass <- as.factor(d$binclass) model <- C5.0(binclass ~ ., data=d) plot(model) } 

Calling f() results in the following error: Error in is.data.frame(data) : object 'd' not found

Edit: According to the answer from MrFlick, it seems that the cause of this problem is a bug in the C5.0 code. There are some workarounds that Pascal and Mr. Flick point to.

+5
source share
2 answers

There seems to be an error in the code when it comes to evaluating a team in the right environment. The problem is the function C50::model.frame.C5.0 . The β€œcleanest” work around that I could find was to add the terms property to your model. This will help encapsulate the feature environment.

 f <- function(){ library(C50) set.seed(1) class = c(1,2) d <- data.frame(feature1 = sample(1:10,10,replace=TRUE), feature2 = 1:10, binclass = class) d$binclass <- as.factor(d$binclass) model <- C5.0(binclass ~ ., data=d) model$terms <- eval(model$call$formula) #<---- Added line plot(model) } 
+3
source

@MrFlick almost had this, but not quite. This problem for plotting is particularly annoying when trying to pass arbitrary data and target functions to the C50 method. As Mr. Flick noted, this is due to the renaming of terms. Renaming x and y terms in a method call, the graphing function will not get confused.

 tree_model$call$x <- data_train[, -target_index] tree_model$call$y <- data_train[[target_feature]] 

For example, here is a method of transmitting to arbitrary data and the objective function and the ability to plot:

 boosted_trees <- function(data_train, target_feature, iter_choice) { target_index <- grep(target_feature, colnames(data_train)) model_boosted <- C5.0(x = data_train[, -target_index], y = data_train[[target_feature]], trial=iter_choice) model_boosted$call$x <- data_train[, -target_index] model_boosted$call$y <- data_train[[target_feature]] return(model_boosted) } 

The model object returned by the above method can be applied as normal.

 model <- boosted_trees(data_train, 'my_target', 10) plot(model) 
0
source

All Articles