Using the survival tree from the "rpart" package in R to predict new observations

Question

Using the survival tree from the "rpart" package in R to predict new observations

I am trying to use the "rpart" package in R to build a survival tree, and I hope to use this tree to then make predictions for other observations.

I know that there were many SO questions related to rpart and prediction; however, I could not find any address for the problem that (I think) is specific to using rpart with the "Surv" object.

My particular problem is with the interpretation of the results of the predict function. An example is useful:

library(rpart)
library(OIsurv)

# Make Data:
set.seed(4)
dat = data.frame(X1 = sample(x = c(1,2,3,4,5), size = 1000, replace=T))
dat$t = rexp(1000, rate=dat$X1)
dat$t = dat$t / max(dat$t)
dat$e = rbinom(n = 1000, size = 1, prob = 1-dat$t )

# Survival Fit:
sfit = survfit(Surv(t, event = e) ~ 1, data=dat)
plot(sfit)

# Tree Fit:
tfit = rpart(formula = Surv(t, event = e) ~ X1 , data = dat, control=rpart.control(minsplit=30, cp=0.01))
plot(tfit); text(tfit)

# Survival Fit, Broken by Node in Tree:
dat$node = as.factor(tfit$where)
plot( survfit(Surv(dat$t, event = dat$e)~dat$node) )

. , , , rpart . , , , predict(tfit), , . , , predict(fit)[1] .46, , P(s) = exp(−λt), λ=.46.

, . ( ) , / . (EDIT: , , , - /, . , ).

, ...

# Predict:
# an attempt to use the rates extracted from the tree to
# capture the survival curve formula in each tree node.
rates = unique(predict(tfit))
for (rate in rates) {
  grid= seq(0,1,length.out = 100)
  lines(x= grid, y= exp(-rate*(grid)), col=2)
}

, , , , survfit . . , ( ) "rate" ( ) .

, , : , X - .

, , , , rpart/survival . - (1) rpart (2) ?

+4

r tree rpart survival-analysis

jwdink 08 . '15 2:48

1

Achim Zeileis · Accepted Answer · 2015-06-09T13:18:58+0000

, node 1.000. , predict(), node, . . 8.4 vignette("longintro", package = "rpart"). , -, , , rpart.

- , rpart constparty, partykit:

library("partykit")
(tfit2 <- as.party(tfit))
## Model formula:
## Surv(t, event = e) ~ X1
## 
## Fitted party:
## [1] root
## |   [2] X1 < 2.5
## |   |   [3] X1 < 1.5: 0.192 (n = 213)
## |   |   [4] X1 >= 1.5: 0.082 (n = 213)
## |   [5] X1 >= 2.5: 0.037 (n = 574)
## 
## Number of inner nodes:    2
## Number of terminal nodes: 3
##
plot(tfit2)

-. predict(), type "response" "prob" .

predict(tfit2, type = "response")[1]
##          5 
## 0.03671885 
predict(tfit2, type = "prob")[[1]]
## Call: survfit(formula = y ~ 1, weights = w, subset = w > 0)
## 
##  records    n.max  n.start   events   median  0.95LCL  0.95UCL 
## 574.0000 574.0000 574.0000 542.0000   0.0367   0.0323   0.0408

rpart ctree() ( ) mob() partykit.

Using the survival tree from the "rpart" package in R to predict new observations

More articles: