Freezing data on various variables in a frame using filters

Question

Freezing data on various variables in a frame using filters

I am trying to compute a few new variables in my framework. Take the initial values, for example:

Let's say I have:

Dataset <- data.frame(time=rep(c(1990:1992),2), geo=c(rep("AT",3),rep("DE",3)),var1=c(1:6), var2=c(7:12)) time geo var1 var2 1 1990 AT 1 7 2 1991 AT 2 8 3 1992 AT 3 9 4 1990 DE 4 10 5 1991 DE 5 11 6 1992 DE 6 12

I want too:

  time geo var1 var2 var1_1990 var1_1991 var2_1990 var2_1991 1 1990 AT 1 7 1 2 7 8 2 1991 AT 2 8 1 2 7 8 3 1992 AT 3 9 1 2 7 8 4 1990 DE 4 10 4 5 10 11 5 1991 DE 5 11 4 5 10 11 6 1992 DE 6 12 4 5 10 11

Thus, time and variable are changed for new variables. Here is my attempt:

 intitialyears <- c(1990,1991) intitialvars <- c("var1", "var2") # ideally, I want code where I only have to change these two vectors # and where it possible to change their dimensions for (i in initialyears){ lapply(initialvars,function(x){ rep(Dataset[time==i,x],each=length(unique(Dataset$time))) })}

The work is done without errors, but does not give anything. I would like to name the variables in the example (for example, "var1_1990") and immediately make the new variables part of the data frame. I would also like to avoid the for loop, but I don't know how to wrap two applications around this function. Should I use a function to use two arguments? The problem is that the apply function does not carry the results in my environment? I have been stuck here for a while, so I will be grateful for any help!

ps: I have a solution to make this combination a combination without use and the like, but I'm trying to get away from copying and pasting:

 Dataset$var1_1990 <- c(rep(Dataset$var1[which(Dataset$time==1990)], each=length(unique(Dataset$time))))

+7

r dataframe lapply

Peter Pan Apr 29 '15 at 13:02

source share

3 answers

Here is a data.table method:

 require(data.table) dt <- as.data.table(Dataset) in_cols = c("var1", "var2") out_cols = do.call("paste", c(CJ(in_cols, unique(dt$time)), sep="_")) dt[, (out_cols) := unlist(lapply(.SD, as.list), FALSE), by=geo, .SDcols=in_cols] # time geo var1 var2 var1_1990 var1_1991 var1_1992 var2_1990 var2_1991 var2_1992 # 1: 1990 AT 1 7 1 2 3 7 8 9 # 2: 1991 AT 2 8 1 2 3 7 8 9 # 3: 1992 AT 3 9 1 2 3 7 8 9 # 4: 1990 DE 4 10 4 5 6 10 11 12 # 5: 1991 DE 5 11 4 5 6 10 11 12 # 6: 1992 DE 6 12 4 5 6 10 11 12

This assumes that the time variable is identical (and in the same order) for each geo value.

+2

Arun Apr 29 '15 at 15:28

source share

Using dplyr and tidyr and using a custom function, try the following:

Data

 Dataset <- data.frame(time=rep(c(1990:1992),2), geo=c(rep("AT",3),rep("DE",3)),var1=c(1:6), var2=c(7:12))

the code

 library(dplyr); library(tidyr) intitialyears <- c(1990,1991) intitialvars <- c("var1", "var2") #create this function myTranForm <- function(dataSet, varName, years){ temp <- dataSet %>% select(time, geo, eval(parse(text=varName))) %>% filter(time %in% years) %>% mutate(time=paste(varName, time, sep="_")) names(temp)[names(temp) %in% varName] <- "someRandomStringForVariableName" temp <- temp %>% spread(time, someRandomStringForVariableName) return(temp) } #Then lapply on intitialvars using the custom function DatasetList <- lapply(intitialvars, function(x) myTranForm(Dataset, x, intitialyears)) #and loop over the data frames in the list for(i in 1:length(intitialvars)){ Dataset <- left_join(Dataset, DatasetList[[i]]) } Dataset

+1

dimitris_ps Apr 29 '15 at 14:22

source share

bgoldst · Accepted Answer · 2015-04-29T13:37:10+0000

This can be done using subset() , reshape() and merge() :

 merge(Dataset,reshape(subset(Dataset,time%in%c(1990,1991)),dir='w',idvar='geo',sep='_')); ## geo time var1 var2 var1_1990 var2_1990 var1_1991 var2_1991 ## 1 AT 1990 1 7 1 7 2 8 ## 2 AT 1991 2 8 1 7 2 8 ## 3 AT 1992 3 9 1 7 2 8 ## 4 DE 1990 4 10 4 10 5 11 ## 5 DE 1991 5 11 4 10 5 11 ## 6 DE 1992 6 12 4 10 5 11

The order of the columns is not what you have in your question, but if necessary, you can fix it after the operation with the index.

Freezing data on various variables in a frame using filters

More articles: