Glm and relative risk - copy Stata code to R

Question

Glm and relative risk - copy Stata code to R

Can someone help me repeat these calculations of the relative risk (and its confidence interval) in R?

A similar procedure used in Stata is described here . Can someone show me how to do this in R (my data has clusters and strata, but I took a simpler example)? I tried the relrisk.est function, but I would prefer to use the survey package, as this handles very complex projects. I would also like to compare the ratings of Stata and R. I use Poisson as suggested here .

###STATA CODE use http://www.ats.ucla.edu/stat/stata/faq/eyestudy tabulate carrot lenses *same as R binomial svyglm below xi: glm lenses carrot, fam(bin) *switch reference code char carrot[omit] 1 xi: glm lenses i.carrot, fam(poisson) link(log) nolog robust eform ###R library(foreign) library(survey) D<-read.dta("http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta") table(D$lenses,D$carrot) D$wgt<-rep(1,nrow(D)) Dd<-svydesign(id=~1,data=D,weights=~wgt) #change category and eform....? svyglm(lenses~carrot,Dd,family=binomial) svyglm(lenses~carrot,Dd,family=quasipoisson(log))

+4

r relative stata glm

Marco m Jan 13 '13 at 8:49

source share

1 answer

Theodore lytras · Accepted Answer · 2013-01-13T11:43:25+0000

Your example is a simple data set, so you don’t need to use a survey package at all. I would also suggest that by exploring multiple regression with R, you start with simpler examples and gradually build up your understanding of the methods used.

In the end, my opinion is that the philosophy of Stata and R is different. Stata easily throws a ton of information onto your face, not knowing what it means or how it is received. R, on the other hand, can be just as powerful (or even more powerful) and versatile, but it doesn’t need to “automate” Stata and makes you walk slowly to get the desired results.

Here's a more literal translation of your Stata code:

 require(foreign) data <- read.dta("http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta") with(data, table(carrot, lenses)) glm.out.1 <- glm(lenses ~ carrot, family="binomial", data=data) summary(glm.out.1) logLik(glm.out.1) # get the log-likelihood for the model, as well glm.out.2 <- glm(lenses ~ factor(carrot, levels=c(1,0)), family="poisson"(link="log"), data=data) summary(glm.out.2) logLik(glm.out.2) exp(cbind(coefficients(glm.out.2), confint(glm.out.2))) # deriving robust estimates. vcovHC() is in the sandwich package. require(sandwich) robSE <- sqrt(diag(vcovHC(glm.out.2, type="HC")))[2] rob <- coef(glm.out.2)[2] rob <- exp(c(rob, rob+qnorm(0.025)*robSE, rob-qnorm(0.025)*robSE)) names(rob) <- c("", "2.5 %", "97.5 %") rob

Please note that (link="log") in the second glm() call is not required, because it is the default communication function when family="poisson" .

To get "reliable" ratings, I had to read this , which was very useful. You must use the vcovHC() function in the sandwich package to get a different variance covariance matrix than the one calculated by glm() and use it to calculate standard error and parameter estimates.

“Strong” ratings were almost identical to what I got from Stata, right down to the third decimal place. All other results were completely identical; run the code and see for yourself.

Oh, and one more thing: when you find your way using glm () with non-stratified projects, then draw your way through the survey package, which contains copies of this and other analysis functions modified for complex projects. I also recommend that you read Thomas Lumley's book Comprehensive Surveys: A Guide to Analysis Using R.

Glm and relative risk - copy Stata code to R

More articles: