Linear Model (lm) with dependent variable being a factor / categorical variable

Question

Linear Model (lm) with dependent variable being a factor / categorical variable

I want to do linear regression using the lm function (or another if this works). My dependent variable is a factor called AccountStatus :

1: 0 days with arrears, 2: 30-60 days with arrears, 3: 60-90 days with arrears and 4: 90+ days with arrears. (4)

As an independent variable, I have several numerical variables: loan amount, income payable and interest rate.

Is it possible to do linear regression with these variables? I looked on the Internet and found something about a mannequin, but they were all for an independent variable.

This did not work:

 fit <- lm(factor(AccountStatus) ~ OriginalLoanToValue, data=mydata) summary(fit)

+7

r r-factor lm

Tim_Utrecht Mar 05 '14 at 9:00

source share

2 answers

If you can specify a numerical value for the variables, then you may have a solution. You must rename the values to numbers, and then convert the variable to numeric. Here's how:

 library(plyr) my.data2$islamic_leviathan_score <- revalue(my.data2$islamic_leviathan, c("(1) Very Suitable"="3", "(2) Suitable"="2", "(3) Somewhat Suitable"="1", "(4) Not Suitable At All"="-1")) my.data2$islamic_leviathan_score_1 <- as.numeric(as.character(my.data2$islamic_leviathan_score))

This overestimates the potential values when converting the variable as numeric. The results are consistent with the original values contained in the data set when the variables are factors. You can use this solution to change the name of the variables to whatever you like and convert them to numeric variables.

Finally, it's worth it because it allows you to draw histograms or regressions, which is not possible with variable factors.

Hope this helps!

0

saladin1991 Dec 20 '16 at 7:59

source share

Maxim.K · Accepted Answer · 2014-03-05T09:39:37+0000

Linear regression does not accept categorical variables for the dependent part, it must be continuous. Given that your AccountStatus variable has only four levels, it is not feasible to examine it continuously. Before embarking on any statistical analysis, you need to know the measurement levels for the variables.

What you can do is use multi-component logistic regression, for example, here . Alternatively, you can transcode AccountStatus as dichotomous and use simple logistic regression.

Sorry to disappoint you, but this is just an integral limitation of multiple regression, it has nothing to do with R. If you want to know more about which statistical technique is suitable for different combinations of measurement levels of dependent and independent variables, I can fully recommend this book .

Linear Model (lm) with dependent variable being a factor / categorical variable

More articles: