Find the corresponding node in the regression tree using rpart

I'm new to R, and I'm stuck with a pretty dumb problem.

I will calibrate the regression tree with the rpart package to do some classification and some prediction.

Thanks to R, the calibration part is easy to operate and easy to operate.

#the package rpart is needed library(rpart) # Loading of a big data file used for calibration my_data <- read.csv("my_file.csv", sep=",", header=TRUE) # Regression tree calibration tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 + Attribute4 + Attribute5, method="anova", data=my_data, control=rpart.control(minsplit=100, cp=0.0001)) 

After calibrating the large decision tree, I want the appropriate cluster of some new data (and therefore the predicted value) to be found for the given data sample.
The predict function seems ideal for need.

 # read validation data validationData <-read.csv("my_sample.csv", sep=",", header=TRUE) # search for the probability in the tree predict <- predict(tree, newdata=validationData, class="prob") # dump them in a file write.table(predict, file="dump.txt") 

However, using the predict method predict I just get the predicted ratio of my new elements, and I cannot find a way to get the leaf of the decision tree tree where my new elements belong.

I think this should be pretty easy to get, since the method must find this sheet to return the coefficient.

There are several parameters that can be provided to the prediction method using the class= argument, but for the regression tree, everyone seems to return the same thing (the value of the target attribute of the decision tree)

Does anyone know how to get the corresponding node in the decision tree?

By path.rpart node using the path.rpart method, this will help me understand the results.

+6
r regression cart-analysis decision-tree rpart
source share
3 answers

Benjamin's answer, unfortunately, does not work: type="vector" still returns the predicted values.

My solution is pretty klugy, but I don't think there is a better way. The trick is to replace the predicted y values ​​in the model frame with the corresponding node numbers.

 tree2 = tree tree2$frame$yval = as.numeric(rownames(tree2$frame)) predict = predict(tree2, newdata=validationData) 

Now the forecast output will be node numbers as opposed to the predicted y values.

(One note: the above worked in my case when the tree was a regression tree and not a classification tree. In the case of a classification tree, you probably need to omit as.numeric or replace it with as.factor .)

+11
source share

I think you want type="vector" instead of class="prob" (I don't think the class is a recognized parameter of the forecasting method), as described in rpart docs:

If type = "vector": the vector of predicted responses. For regression trees, this is the average response to a node, for Poisson, this is an estimate of the speed of response, and for classification, trees are a predicted class (as a number).

+1
source share

You can use partykit package:

 fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis) library("partykit") fit.party <- as.party(fit) predict(fit.party, newdata = kyphosis[1:4, ], type = "node") 

For your example, just install

 predict(as.party(tree), newdata = validationData, type = "node") 
+1
source share

All Articles