How to create a dummy variable in R?

So, my data set consists of 15 variables, one of which (gender) has only 2 levels. I want to use it as a dummy variable, but levels 1 and 2. How do I do this? I want to have levels 0 and 1, but I don't know how to manage this in R!

+6
source share
3 answers

With most R modeling tools with a formula interface, you don’t need to create dummy variables, the basic code that processes and interprets the formula will do it for you. If you need a dummy variable for some other reason, then there are several options. The easiest (IMHO) is to use model.matrix() :

 set.seed(1) dat <- data.frame(sex = sample(c("male","female"), 10, replace = TRUE)) model.matrix( ~ sex - 1, data = dat) 

which gives:

 > dummy <- model.matrix( ~ sex - 1, data = dat) > dummy sexfemale sexmale 1 0 1 2 0 1 3 1 0 4 1 0 5 0 1 6 1 0 7 1 0 8 1 0 9 1 0 10 0 1 attr(,"assign") [1] 1 1 attr(,"contrasts") attr(,"contrasts")$sex [1] "contr.treatment" > dummy[,1] 1 2 3 4 5 6 7 8 9 10 0 0 1 1 0 1 1 1 1 0 

You can use any dummy column as a numeric dummy variable; select any column you want to be level 1 . dummy[,1] selects 1 as representing the female class and dummy[,2] male class.

Pass this as a factor if you want to be interpreted as a categorical object:

 > factor(dummy[, 1]) 1 2 3 4 5 6 7 8 9 10 0 0 1 1 0 1 1 1 1 0 Levels: 0 1 

But it defeats the object of the factor; what else is 0 ?

+21
source

Ty this

 set.seed(001) # generating some data sex <- factor(sample(1:2, 10, replace=TRUE)) # this is what you have [1] 1 1 2 2 1 2 2 2 2 1 Levels: 1 2 sex<-factor(ifelse(as.numeric(sex)==2, 1,0)) # this is what you want sex [1] 0 0 1 1 0 1 1 1 1 0 Levels: 0 1 

If you want the tags to be 0 = Male and 1 = Female, then ...

 sex<-factor(ifelse(as.numeric(sex)==2, 1,0), labels=c('M', 'F')) sex # this is what you want [1] MMFFMFFFFM Levels: MF 

In fact, you do not need to create a dummy variable to evaluate the model using lm , see the following example:

 set.seed(001) # Generating some data N <- 100 x <- rnorm(N, 50, 20) y <- 20 + 3.5*x + rnorm(N) sex <- factor(sample(1:2, N, replace=TRUE)) # Estimating the linear model lm(y ~ x + sex) # using the first category as the baseline (this means sex==1) Call: lm(formula = y ~ x + sex) Coefficients: (Intercept) x sex2 19.97815 3.49994 -0.02719 # renaming the categories and labelling them sex<-factor(ifelse(as.numeric(sex)==2, 1,0), labels=c('M', 'F')) lm(y ~ x + sex) # the same results, baseline is 'Male' Call: lm(formula = y ~ x + sex) Coefficients: (Intercept) x sexF 19.97815 3.49994 -0.02719 

As you can see that R deals with mannequins, you just pass them into the formula as a factor variable, and R will do the rest for you.

By the way, there is no need to change categories from c (2,1) to c (0,1), the results will be the same as you can see in the above example.

+9
source

As suggested by many above, turn it into a factor.

If you really want to use dummy code for a gender variable, consider this

 set.seed(100) gender = rbinom(100,1,0.5)+1 gender_dummy = gender-1 
+1
source

Source: https://habr.com/ru/post/927476/


All Articles