Restore a miniature graph of normal probability

I am trying to recreate the following graph with R. Minitab describes this as a graph of normal probability.

alt text

probplot gives you most of the way. Unfortunately, I can’t figure out how to add confidence bands around this chart.

Similarly, ggplot stat_qq () seems to represent similar information with a transformed x axis. It seems that geom_smooth() would be a likely candidate for adding groups, but I didn't get it.

Finally, Get Genetics Done members describe something similar here.

Sample data to recreate the above graph:

 x <- c(40.2, 43.1, 45.5, 44.5, 39.5, 38.5, 40.2, 41.0, 41.6, 43.1, 44.9, 42.8) 

If anyone has a solution with basic graphics or ggplot, I would appreciate it!

EDIT

After looking at the details of probplot , I determined that it creates a line of correspondence on the graph:

 > xl <- quantile(x, c(0.25, 0.75)) > yl <- qnorm(c(0.25, 0.75)) > slope <- diff(yl)/diff(xl) > int <- yl[1] - slope * xl[1] > slope 75% 0.4151 > int 75% -17.36 

Indeed, comparing these results with what you get from the probplot object seems to compare very well:

 > check <- probplot(x) > str(check) List of 3 $ qdist:function (p) $ int : Named num -17.4 ..- attr(*, "names")= chr "75%" $ slope: Named num 0.415 ..- attr(*, "names")= chr "75%" - attr(*, "class")= chr "probplot" > 

However, including this information in ggplot2 or the base graphics does not give the same results.

 probplot(x) 

alt text

Versus:

 ggplot(data = df, aes(x = x, y = y)) + geom_point() + geom_abline(intercept = int, slope = slope) 

alt text

I get similar results using basic R graphics

 plot(df$x, df$y) abline(int, slope, col = "red") 

Finally, I found out that the last two lines of the legend relate to the Anderson-Darlene test for normality and can be reproduced using the nortest package.

 > ad.test(x) Anderson-Darling normality test data: x A = 0.2303, p-value = 0.7502 
+4
source share
3 answers

Perhaps this will be something you can rely on. By default, stat_smooth () uses level = 0.95.

 df <- data.frame(sort(x), ppoints(x)) colnames(df) <- c("x","y") ggplot(df, aes(x,y)) + geom_point() + stat_smooth() + scale_y_continuous(limits=c(0,1),breaks=seq(from=0.05,to=1,by=0.05), formatter="percent") 
+2
source

Try the qqPlot function in the QTLRel package.

 require("QTLRel") qqPlot(rnorm(100)) 

enter image description here

+1
source

you use the wrong "y", they must be quantiles (labeled with probabilities). Below is the line in the right place:

 df<-data.frame(x=sort(x),y=qnorm(ppoints(length(x)))) probs <- c(0.01, 0.05, seq(0.1, 0.9, by = 0.1), 0.95, 0.99) qprobs<-qnorm(probs) xl <- quantile(x, c(0.25, 0.75)) yl <- qnorm(c(0.25, 0.75)) slope <- diff(yl)/diff(xl) int <- yl[1] - slope * xl[1] ggplot(data = df, aes(x = x, y = y)) + geom_point() + geom_abline(intercept = int,slope = slope)+scale_y_continuous(limits=range(qprobs), breaks=qprobs, labels = 100*probs)+labs(y ="Percent" , x="Data") 

to add confidence boundaries, as in Minitab, you can do the following

 fd<-fitdistr(x, "normal") #Maximum-likelihood Fitting of Univariate Dist from MASS xp_hat<-fd$estimate[1]+qprobs*fd$estimate[2] #estimated perc. for the fitted normal v_xp_hat<- fd$sd[1]^2+qprobs^2*fd$sd[2]^2+2*qprobs*fd$vcov[1,2] #var. of estimated perc xpl<-xp_hat + qnorm(0.025)*sqrt(v_xp_hat) #lower bound xpu<-xp_hat + qnorm(0.975)*sqrt(v_xp_hat) #upper bound df.bound<-data.frame(xp=xp_hat,xpl=xpl, xpu = xpu,nquant=qprobs) 

and add the following two lines to your ggplot on top (in addition, replace the slope approach and intercept the line with estimated percentiles)

 geom_line(data=df.bound,aes(x = xp, y = qprobs))+ geom_line(data=df.bound,aes(x = xpl, y = qprobs))+ geom_line(data=df.bound,aes(x = xpu, y = qprobs)) 
+1
source

All Articles