Loess and glm graphics with ggplot2

Question

Loess and glm graphics with ggplot2

I am trying to build model predictions from binary glm versus empirical probability using data from titanic. To show the differences between classes and sex, I use faceting, but I have two things that I cannot understand. Firstly, I would like to limit the loess curve from 0 to 1, but if I add the ylim(c(0,1)) option to the end of the graph, the ribbon around the loess curve is cut off if one part of it is outside the border. The second thing I would like to do is draw a line from the minimum x value (predicted probability from glm) for each face, to the maximum x value (within the same face) and y = 1 to show glm predicted probability.

loess-titanic

 #info on this data http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3info.txt load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.sav')) titanic <- titanic3[ ,-c(3,8:14)]; rm(titanic3) titanic <- na.omit(titanic) #probably missing completely at random titanic$age <- as.numeric(titanic$age) titanic$sibsp <- as.integer(titanic$sibsp) titanic$survived <- as.integer(titanic$survived) training.df <- titanic[sample(nrow(titanic), nrow(titanic) / 2), ] validation.df <- titanic[!(row.names(titanic) %in% row.names(training.df)), ] glm.fit <- glm(survived ~ sex + sibsp + age + I(age^2) + factor(pclass) + sibsp:sex, family = binomial(link = "probit"), data = training.df) glm.predict <- predict(glm.fit, newdata = validation.df, se.fit = TRUE, type = "response") plot.data <- data.frame(mean = glm.predict$fit, response = validation.df$survived, class = validation.df$pclass, sex = validation.df$sex) require(ggplot2) ggplot(data = plot.data, aes(x = as.numeric(mean), y = as.integer(response))) + geom_point() + stat_smooth(method = "loess", formula = y ~ x) + facet_wrap( ~ class + sex, scale = "free") + ylim(c(0,1)) + xlab("Predicted Probability of Survival") + ylab("Empirical Survival Rate")

+4

r ggplot2

Zach Jan 12 '13 at 16:10

source share

1 answer

Ben bolker · Accepted Answer · 2013-01-12T16:46:46+0000

The answer to your first question is to use coord_cartesian(ylim=c(0,1)) instead of ylim(0,1) ; these are moderately frequently asked questions.

For your second question, there might be a way to do this in ggplot, but it was easier for me to summarize the data from the outside:

 g0 <- ggplot(data = plot.data, aes(x = mean, y = response)) + geom_point() + stat_smooth(method = "loess") + facet_wrap( ~ class + sex, scale = "free") + coord_cartesian(ylim=c(0,1))+ labs(x="Predicted Probability of Survival", y="Empirical Survival Rate")

(I cut your code a bit by excluding some defaults and using labs .)

 ss <- ddply(plot.data,c("class","sex"),summarise,minx=min(mean),maxx=max(mean)) g0 + geom_segment(data=ss,aes(x=minx,y=minx,xend=maxx,yend=maxx), colour="red",alpha=0.5)

Loess and glm graphics with ggplot2

More articles: