Recently, I have been trying to fit many random effect models to relatively large data sets. Suppose that about 50,000 people (or more) observed up to 25 time points. With such a large sample size, we include many predictors that have been adjusted โ perhaps 50 or so fixed effects. Im fitting the model to a binary result using lme4::glmer in R, with random hooks for each object. I cannot go into the details of the data, but the main format of the glmer command that I used was:
fit <- glmer(outcome ~ treatment + study_quarter + dd_quarter + (1|id), family = "binomial", data = dat)
where both study_quarter and dd_quarter are factors with approximately 20 levels each.
When I try to fit this model into R, it works for about 12-15 hours and returns an error with which it did not converge. I did a bunch of troubleshooting (e.g. following these guidelines), without any improvement. And the convergence does not even end at the end (the maximum gradient is about 5-10, while the convergence criterion is 0.001).
Then I tried to fit the model in Stata using the melogit command. The model fits in less than 2 minutes, with no convergence issues. Corresponding Stata command p>
melogit outcome treatment i.study_quarter i.dd_quarter || id:
What gives? Does Stata have a better fitting algorithm or is it better optimized for large models and large data sets? Its really amazing how different the lead times are.
r lme4 mixed-models stata
Jonathan gellar
source share