Here is another approach. Note that I choose that subcat's biasing variables into binary indicator variables reduce redundancy:
Input:
data <- read.table(header=TRUE, text=' year var1 var2 2009 000000 00000001 2010 000000 00000001 2009 000000 00000002 2010 000000 00000002 2009 000000 00000003 2009 100000 10000001 2009 100000 10000004 2010 100000 10000010 ', colClasses = c('character', 'character', 'character'))
Simplification of var2 column:
subCat <- function(s) { substr(s, nchar(s) - 1, nchar(s)) } data$var2 <- subCat(data$var2)
Creating mannequins:
Method 1:
t <- table(1:length(data$var2), data$var2) data <- cbind(data, as.data.frame.matrix(t)) data$var2 <- NULL
Output:
year var1 01 02 03 04 10 1 2009 000000 1 0 0 0 0 2 2010 000000 1 0 0 0 0 3 2009 000000 0 1 0 0 0 4 2010 000000 0 1 0 0 0 5 2009 000000 0 0 1 0 0 6 2009 100000 1 0 0 0 0 7 2009 100000 0 0 0 1 0 8 2010 100000 0 0 0 0 1
==================================================== ==========
Method 2:
library(dummies) data$var2 <- subCat(data$var2) data3 <- cbind(data, dummy(data$var2)) data3$var2 = NULL
Output:
year var1 data01 data02 data03 data04 data10 1 2009 000000 1 0 0 0 0 2 2010 000000 1 0 0 0 0 3 2009 000000 0 1 0 0 0 4 2010 000000 0 1 0 0 0 5 2009 000000 0 0 1 0 0 6 2009 100000 1 0 0 0 0 7 2009 100000 0 0 0 1 0 8 2010 100000 0 0 0 0 1
==================================================== ==========
Method 3:
dummies <- sapply(unique(data$var2), function(x) as.numeric(data$var2 == x)) data <- cbind(data, dummies) data$var2 = NULL
Output:
year var1 X01 X02 X03 X04 X10 1 2009 000000 1 0 0 0 0 2 2010 000000 1 0 0 0 0 3 2009 000000 0 1 0 0 0 4 2010 000000 0 1 0 0 0 5 2009 000000 0 0 1 0 0 6 2009 100000 1 0 0 0 0 7 2009 100000 0 0 0 1 0 8 2010 100000 0 0 0 0 1
Synergist
source share