Geom_smooth for a subset of data

Question

Geom_smooth for a subset of data

Here are some details and plot:

set.seed(18) data = data.frame(y=c(rep(0:1,3),rnorm(18,mean=0.5,sd=0.1)),colour=rep(1:2,12),x=rep(1:4,each=6)) ggplot(data,aes(x=x,y=y,colour=factor(colour)))+geom_point()+ geom_smooth(method='lm',formula=y~x,se=F)

enter image description here

As you can see, linear regression is strongly influenced by values where x = 1. Can I get linear regressions calculated for x> = 2 but displaying values for x = 1 (y is 0 or 1). The resulting graph will be exactly the same, with the exception of linear regressions. They will not “suffer” from the influence of values on abscissa = 1

+8

r regression ggplot2 subset

Remi.b Jun 19 '13 at 15:31

source share

2 answers

The regular lm function has a weights argument, which you can use to assign weight to a specific observation. Thus, you can understand what effect it has on the result. I think this is a general way to solve the problem instead of a subset of the data. Of course, the use of ad hoc weights does not bode well for the statistical validity of the analysis. It is always better to have a rationale behind the scales, for example. Low weight observations have higher uncertainty.

I think the lm function is used under the hood of ggplot2 so that you can pass the weights argument. You can add scales through aesthetic ( aes ), assuming that the weight is stored in the vector:

 ggplot(data,aes(x=x,y=y,colour=factor(colour))) + geom_point()+ stat_smooth(aes(weight = runif(nrow(data))), method='lm')

you can also put the weight in a column in the dataset:

 ggplot(data,aes(x=x,y=y,colour=factor(colour))) + geom_point()+ stat_smooth(aes(weight = weight), method='lm')

where the column is called weight .

+7

Paul hiemstra Jun 19 '13 at 15:40

source share

Matthew plourde · Accepted Answer · 2013-06-19T15:39:39+0000

It is as simple as geom_smooth(data=subset(data, x >= 2), ...) . It doesn’t matter if this plot is intended only for you, but understand that something like this will mislead others if you do not include a reference to how the regression was performed. I would recommend changing the transparency of excluded points.

 ggplot(data,aes(x=x,y=y,colour=factor(colour)))+ geom_point(data=subset(data, x >= 2)) + geom_point(data=subset(data, x < 2), alpha=.2) + geom_smooth(data=subset(data, x >= 2), method='lm',formula=y~x,se=F)

Geom_smooth for a subset of data

More articles: