In general, I would say that it would be advisable to use refit=FALSE in this case, but let go and try an experimental experiment.
First, apply the model without accidental tilt to the sleepstudy , then model the data from this model:
library(lme4) mod0 <- lmer(Reaction ~ Days + (1|Subject), data=sleepstudy)
Now return null data with the full and reduced model and save the distribution of p-values ββgenerated with anova() with and without refit=FALSE . This is essentially a parametric bootstrap test of the null hypothesis; we want to see if it has the corresponding characteristics (i.e., a uniform distribution of p-values).
sumfun <- function(x) { m0 <- refit(mod0,x) m1 <- refit(mod1,x) a_refit <- suppressMessages(anova(m0,m1)["m1","Pr(>Chisq)"]) a_no_refit <- anova(m0,m1,refit=FALSE)["m1","Pr(>Chisq)"] c(refit=a_refit,no_refit=a_no_refit) }
I like plyr::laply for its convenience, although you can just as easily use a for loop or one of the other *apply approaches.
library(plyr) pdist <- laply(simdat,sumfun,.progress="text") library(ggplot2); theme_set(theme_bw()) library(reshape2) ggplot(melt(pdist),aes(x=value,fill=Var2))+ geom_histogram(aes(y=..density..), alpha=0.5,position="identity",binwidth=0.02)+ geom_hline(yintercept=1,lty=2) ggsave("nullhist.png",height=4,width=5)

Error rate I type for alpha = 0.05:
colMeans(pdist<0.05) ## refit no_refit ## 0.021 0.026
You can see that in this case the two procedures give almost the same answer, and both procedures are very conservative for well-known reasons related to the fact that the zero value of the hypothesis test is on the border of its admissible space. For a particular case of testing one simple random effect, decreasing the p-value gives an appropriate response (see Pinheiro and Bates 2000 and others); in fact, it gives reasonable answers here, although this is not entirely justified, because here we discard two parameters of random effects (random tilt effect and the correlation between tilt and interception of random effects):
colMeans(pdist/2<0.05)
Other items:
- You can perform a similar exercise using the
PBmodcomp function from the pbkrtest package. - The
RLRsim package RLRsim designed specifically for quick randomized (parametric bootstraps) tests of null hypotheses about the terms of random effects, but does not seem to work in this somewhat more complicated situation. - see the relevant GLMM faq section for similar information, including arguments for why you might not want to check the meaning of random effects at all ...
- for additional credit, you can re-execute parametric bootstraps using deviations of deviations (-2 logarithm of likelihood) rather than p-values ββas output, and check if the results of the mixture corresponded between a
chi^2_0 (point mass at 0) and a chi^2_n (where n is probably 2, but I would not be sure about this geometry)