With most R modeling tools with a formula interface, you donβt need to create dummy variables, the basic code that processes and interprets the formula will do it for you. If you need a dummy variable for some other reason, then there are several options. The easiest (IMHO) is to use model.matrix() :
set.seed(1) dat <- data.frame(sex = sample(c("male","female"), 10, replace = TRUE)) model.matrix( ~ sex - 1, data = dat)
which gives:
> dummy <- model.matrix( ~ sex - 1, data = dat) > dummy sexfemale sexmale 1 0 1 2 0 1 3 1 0 4 1 0 5 0 1 6 1 0 7 1 0 8 1 0 9 1 0 10 0 1 attr(,"assign") [1] 1 1 attr(,"contrasts") attr(,"contrasts")$sex [1] "contr.treatment" > dummy[,1] 1 2 3 4 5 6 7 8 9 10 0 0 1 1 0 1 1 1 1 0
You can use any dummy column as a numeric dummy variable; select any column you want to be level 1 . dummy[,1] selects 1 as representing the female class and dummy[,2] male class.
Pass this as a factor if you want to be interpreted as a categorical object:
> factor(dummy[, 1]) 1 2 3 4 5 6 7 8 9 10 0 0 1 1 0 1 1 1 1 0 Levels: 0 1
But it defeats the object of the factor; what else is 0 ?
source share