I want to do stepwise regression using AIC in the list of linear models. the idea is to use an e-list of linear models, and then apply stepAIC to each element of the list. He is failing.
Hi guys, I tried to track the problem. I seem to have found a problem. However, I do not understand the reason. Try the code to see the difference between the three cases.
require(MASS) n<-30 x1<-rnorm(n, mean=0, sd=1)
I am sure that stepAIC () needs the raw data from data.frame "dat". This is what I thought about before. (I hope I'm right.) But there is no parameter in stepAIC () where I can pass the original data frame. Obviously, for simple models that are not wrapped in a list, just go through the model. (last three lines in the code) Therefore, I wonder:
Q1: How does stepAIC know where to find the original "dat" data (not just the model data that is passed as a parameter)?
Q2: How can I know that there is another parameter in stepAIC () that is not explicitly listed on the help pages? (maybe my english is too bad to find)
Q3: How to pass this parameter stepAIC ()?
It should be somewhere in the environment of the apply function and pass data. Either lm () or stepAIC (), and the pointer / link to the raw data must be lost somewhere. I donβt understand very well what the environment is in R. For me it was like isolating local variables from global ones. But maybe this is more complicated. Anyone who can explain this to me in relation to the problem above? Honestly, I don't read much from the R documentation . Any better understanding will help me. Thanks.
OLD: I have data in a dataframe df that can be divided into several subgroups. To do this, I created a groupID with the name df $ id. lm () returns the coefficient, as expected for the first subgroup. I want to do stepwise regression using AIC as a criterion for each subgroup separately. I use lmList {lme4}, which leads to a model for each subgroup (id). But if I use stepAIC {MASS} for list items, it throws an error. See below.
So the question is: what is the error in my procedure / syntax? I get results for individual models, but not for those created using lmList. Does lmList () store different model information than lm ()?
But the help says: class "lmList": a list of objects of the lm class with a common model.
>lme4.list.lm<-lmList(formula=Scherkraft.N~Gap.um+Standoff.um+Voidflaeche.px |df$id,data = df) >lme4.list.lm[[1]] Call: lm(formula = formula, data = data) Coefficients: (Intercept) Gap.um Standoff.um Voidflaeche.px 62.306133 -0.009878 0.026317 -0.015048 >stepAIC(lme4.list.lm[[1]], direction="backward")
Obviously something is not working with the list. But I have no idea what it could be. Since I tried to do the same with the base package that creates the same model (at least the same coefficients). The results are shown below:
>lin.model<-lm(Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px,df[which(df$id==1),]) # id is in order, so should be the same subgroup as for the first list element in lmList Coefficients: (Intercept) Gap.um Standoff.um Voidflaeche.px 62.306133 -0.009878 0.026317 -0.015048
Well, this is what I get with stepAIC on my linear .model. As far as I know, the akaike information criterion can be used to assess which model balances better between fit and generalization, given some data.
>stepAIC(lin.model,direction="backward") Start: AIC=295.12 Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px Df Sum of Sq RSS AIC - Standoff.um 1 2.81 7187.3 293.14 - Gap.um 1 29.55 7214.0 293.37 <none> 7184.4 295.12 - Voidflaeche.px 1 604.38 7788.8 297.97 Step: AIC=293.14 Scherkraft.N ~ Gap.um + Voidflaeche.px Df Sum of Sq RSS AIC - Gap.um 1 28.51 7215.8 291.38 <none> 7187.3 293.14 - Voidflaeche.px 1 717.63 7904.9 296.85 Step: AIC=291.38 Scherkraft.N ~ Voidflaeche.px Df Sum of Sq RSS AIC <none> 7215.8 291.38 - Voidflaeche.px 1 795.46 8011.2 295.65 Call: lm(formula = Scherkraft.N ~ Voidflaeche.px, data = df[which(df$id == 1), ]) Coefficients: (Intercept) Voidflaeche.px 71.7183 -0.0151
I read from the conclusion that I should use the model: Scherkraft.N ~ Voidflaeche.px, because this is the minimum AIC. Well, it would be nice if someone could briefly describe the way out. My understanding of stepwise regression (subject to reverse cancellation) is that all regressors are included in the original model. Then the least important is eliminated. The criterion for the decision is AIC. and so on ... Somehow I am having problems to correctly interpret the tables. It would be nice if someone could confirm my interpretation. "-" (minus) means excluded regression. In the upper part there is a βstartingβ model and in the table below RSS and AIC are calculated for possible exceptions. So, the first row in the first table speaks about the Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px model - Standoff.um will lead to the creation of AIC 293.14. Choose one without Standoff.um: Scherkraft.N ~ Gap.um + Voidflaeche.px
EDIT:
I replaced lmList {lme4} with dlply () to create a list of models. However, stepAIC does not cope with the list. It gives another error. In fact, I believe this is a problem with the data step that the AIC must go through. I was wondering how it calculates the AIC value for each step only from the model data. I will take the initial data to build the models, leaving each regressor every time. In this regard, I would calculate AIC and compare. So how does stepAIC work if it does not have access to the source data. (I cannot see the parameter where I pass the initial stepAIC data). However, I do not know why it works with a simple model, but not with a model wrapped in a list.
>model.list.all <- dlply(df, .id, function(x) {return(lm(Scherkraft.N~Gap.um+Standoff.um+Voidflaeche.px,data=x)) }) >stepAIC(model.list.all[[1]]) Start: AIC=295.12 Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px Df Sum of Sq RSS AIC - Standoff.um 1 2.81 7187.3 293.14 - Gap.um 1 29.55 7214.0 293.37 <none> 7184.4 295.12 - Voidflaeche.px 1 604.38 7788.8 297.97 Error in is.data.frame(data) : object 'x' not found