Here is my code:
set.seed(1) #Boruta on the HouseVotes84 data from mlbench library(mlbench) #has HouseVotes84 data library(h2o) #has rf #spin up h2o myh20 <- h2o.init(nthreads = -1) #read in data, throw some away data(HouseVotes84) hvo <- na.omit(HouseVotes84) #move from R to h2o mydata <- as.h2o(x=hvo, destination_frame= "mydata") #RF columns (input vs. output) idxy <- 1 idxx <- 2:ncol(hvo) #split data splits <- h2o.splitFrame(mydata, c(0.8,0.1)) train <- h2o.assign(splits[[1]], key="train") valid <- h2o.assign(splits[[2]], key="valid") # make random forest my_imp.rf<- h2o.randomForest(y=idxy,x=idxx, training_frame = train, validation_frame = valid, model_id = "my_imp.rf", ntrees=200) # find importance my_varimp <- h2o.varimp(my_imp.rf) my_varimp
The result I get is "variable importance."
Classical measures are "average decrease in accuracy" and "average decrease in Gini coefficient".
My results:
> my_varimp Variable Importances: variable relative_importance scaled_importance percentage 1 V4 3255.193604 1.000000 0.410574 2 V5 1131.646484 0.347643 0.142733 3 V3 921.106567 0.282965 0.116178 4 V12 759.443176 0.233302 0.095788 5 V14 492.264954 0.151224 0.062089 6 V8 342.811554 0.105312 0.043238 7 V11 205.392654 0.063097 0.025906 8 V9 191.110046 0.058709 0.024105 9 V7 169.117676 0.051953 0.021331 10 V15 135.097076 0.041502 0.017040 11 V13 114.906586 0.035299 0.014493 12 V2 51.939777 0.015956 0.006551 13 V10 46.716656 0.014351 0.005892 14 V6 44.336708 0.013620 0.005592 15 V16 34.779987 0.010684 0.004387 16 V1 32.528778 0.009993 0.004103
Hence my relative importance of Voting No. 4, aka V4, is ~ 3255.2.
Questions: What are these units? How does this happen?
I tried looking in the documentation, but could not find the answer. I tried the help documentation. I tried using Flow to look at the parameters to see what is indicated there. In none of them do I find "gini" or "reduce accuracy." Where should I look?
source share