C5.0 models require factor result

I work with credit.csv to create a learning tree, data is available in:

https://github.com/stedy/Machine-Learning-with-R-datasets/blob/master/credit.csv

and I took the following steps:

credit<-read.csv("credit.csv")
set.seed(12345)
credit_rand<-credit[order(runif(1000)),]
credit_train<-credit_rand[1:900,]
credit_test<-credit_rand[901:1000,]
library(C50)
credit_model<-C5.0(credit_train[-21],credit_train$default)

In the manual that I follow, it appears that I should get rid of the last column, which is the default, but I got the following error:

Error en C5.0.default(credit_train[, -21], credit_train$default) : 
  C5.0 models require a factor outcome

I tried changing the last line to:

credit_model<-C5.0(credit_train[,-21],credit_train$default)

but without success.

Any help?

+4
source share
2 answers

Your problem is that C5.0 models require a factor outcome. You gave the result as credit_train$default, which is the result of 1/2, but R read it as a numerical, not as a factor:

str(credit_train$default)
int [1:900] 2 1 1 1 2 1 2 2 1 1 ...

Then the solution should convert it to a coefficient:

credit_train$default<-as.factor(credit_train$default)
str(credit_train$default)

Factor w/ 2 levels "1","2": 2 1 1 1 2 1 2 2 1 1 ...

And then run your training:

 credit_model<-C5.0(credit_train[-21],credit_train$default)
+5

, ( ) 17, 21. , , .

, , "default" ( "" "" ), .

, , , ,

credit_model <- C5.0(credit_train[-17], credit_train$default)

credit_pred <- predict(credit_model, credit_test)

.

# > CrossTable(credit_test$default, credit_pred,
# +            prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE,
# +            dnn = c('actual default', 'predicted default'))
#
#
#   Cell Contents
# |-------------------------|
# |                       N |
# |         N / Table Total |
# |-------------------------|
#
# 
# Total Observations in Table:  100 
#
# 
#               | predicted default 
#actual default |        no |       yes | Row Total | 
#---------------|-----------|-----------|-----------|
#            no |        57 |        11 |        68 | 
#               |     0.570 |     0.110 |           | 
#---------------|-----------|-----------|-----------|
#           yes |        16 |        16 |        32 | 
#               |     0.160 |     0.160 |           | 
#---------------|-----------|-----------|-----------|
#  Column Total |        73 |        27 |       100 | 
#---------------|-----------|-----------|-----------|
0

All Articles