I run my decision tree model using the rpart package in R. Here's what I do,
- Downloading my data with read.csv
- Delete unwanted columns
- Share my dataset for training and testing
- Installing my model on a training kit - It works all day.
Here is a summary of my dataset.
'data.frame': 117919 obs. of 7 variables:
$ Database : Factor w/ 2 levels "DBIL","DBPD": 1 1 1 1 1 1 1 1 1 1 ...
$ Market_Description: Factor w/ 1 level "MY (PM)": 1 1 1 1 1 1 1 1 1 1 ...
$ Manufacturer : Factor w/ 21 levels "21 Century","Abbott Lab",..: 4 3 4 4 4 4 3 3 3 3 ...
$ Brand : Factor w/ 133 levels "","21 Century",..: 34 26 34 34 34 34 26 26 26 26 ...
$ Sub_Brand : Factor w/ 194 levels "","0-6 Bulan",..: 9 6 9 9 9 9 6 6 6 6 ...
$ Age_Group : Factor w/ 5 levels "","Adultenr",..: 1 1 1 1 1 1 1 1 1 1 ...
$ FMT_Category : Factor w/ 10 levels "Adult Powders (excl Super Bev)",..: 5 5 5 5 5 5 5 5 5 5 ...
Here is my script for the model.
fit <- rpart(FMT_Category~Database+Market_Description+Manufacturer+Brand+Sub_Brand+Age_Group, data=trainingset)
117919 . memory.limit R, 8065, mem_used 40 . , . , . , R - , . , - , stringAsFactors = FALSE. . python script weka, , . , , , , .
. , Sub_Brand, , . ?