Freezing data on various variables in a frame using filters

I am trying to compute a few new variables in my framework. Take the initial values, for example:

Let's say I have:

Dataset <- data.frame(time=rep(c(1990:1992),2), geo=c(rep("AT",3),rep("DE",3)),var1=c(1:6), var2=c(7:12)) time geo var1 var2 1 1990 AT 1 7 2 1991 AT 2 8 3 1992 AT 3 9 4 1990 DE 4 10 5 1991 DE 5 11 6 1992 DE 6 12 

I want too:

  time geo var1 var2 var1_1990 var1_1991 var2_1990 var2_1991 1 1990 AT 1 7 1 2 7 8 2 1991 AT 2 8 1 2 7 8 3 1992 AT 3 9 1 2 7 8 4 1990 DE 4 10 4 5 10 11 5 1991 DE 5 11 4 5 10 11 6 1992 DE 6 12 4 5 10 11 

Thus, time and variable are changed for new variables. Here is my attempt:

 intitialyears <- c(1990,1991) intitialvars <- c("var1", "var2") # ideally, I want code where I only have to change these two vectors # and where it possible to change their dimensions for (i in initialyears){ lapply(initialvars,function(x){ rep(Dataset[time==i,x],each=length(unique(Dataset$time))) })} 

The work is done without errors, but does not give anything. I would like to name the variables in the example (for example, "var1_1990") and immediately make the new variables part of the data frame. I would also like to avoid the for loop, but I don't know how to wrap two applications around this function. Should I use a function to use two arguments? The problem is that the apply function does not carry the results in my environment? I have been stuck here for a while, so I will be grateful for any help!

ps: I have a solution to make this combination a combination without use and the like, but I'm trying to get away from copying and pasting:

 Dataset$var1_1990 <- c(rep(Dataset$var1[which(Dataset$time==1990)], each=length(unique(Dataset$time)))) 
+7
r dataframe lapply
source share
3 answers

This can be done using subset() , reshape() and merge() :

 merge(Dataset,reshape(subset(Dataset,time%in%c(1990,1991)),dir='w',idvar='geo',sep='_')); ## geo time var1 var2 var1_1990 var2_1990 var1_1991 var2_1991 ## 1 AT 1990 1 7 1 7 2 8 ## 2 AT 1991 2 8 1 7 2 8 ## 3 AT 1992 3 9 1 7 2 8 ## 4 DE 1990 4 10 4 10 5 11 ## 5 DE 1991 5 11 4 10 5 11 ## 6 DE 1992 6 12 4 10 5 11 

The order of the columns is not what you have in your question, but if necessary, you can fix it after the operation with the index.

+4
source share

Here is a data.table method:

 require(data.table) dt <- as.data.table(Dataset) in_cols = c("var1", "var2") out_cols = do.call("paste", c(CJ(in_cols, unique(dt$time)), sep="_")) dt[, (out_cols) := unlist(lapply(.SD, as.list), FALSE), by=geo, .SDcols=in_cols] # time geo var1 var2 var1_1990 var1_1991 var1_1992 var2_1990 var2_1991 var2_1992 # 1: 1990 AT 1 7 1 2 3 7 8 9 # 2: 1991 AT 2 8 1 2 3 7 8 9 # 3: 1992 AT 3 9 1 2 3 7 8 9 # 4: 1990 DE 4 10 4 5 6 10 11 12 # 5: 1991 DE 5 11 4 5 6 10 11 12 # 6: 1992 DE 6 12 4 5 6 10 11 12 

This assumes that the time variable is identical (and in the same order) for each geo value.

+2
source share

Using dplyr and tidyr and using a custom function, try the following:

Data

 Dataset <- data.frame(time=rep(c(1990:1992),2), geo=c(rep("AT",3),rep("DE",3)),var1=c(1:6), var2=c(7:12)) 

the code

 library(dplyr); library(tidyr) intitialyears <- c(1990,1991) intitialvars <- c("var1", "var2") #create this function myTranForm <- function(dataSet, varName, years){ temp <- dataSet %>% select(time, geo, eval(parse(text=varName))) %>% filter(time %in% years) %>% mutate(time=paste(varName, time, sep="_")) names(temp)[names(temp) %in% varName] <- "someRandomStringForVariableName" temp <- temp %>% spread(time, someRandomStringForVariableName) return(temp) } #Then lapply on intitialvars using the custom function DatasetList <- lapply(intitialvars, function(x) myTranForm(Dataset, x, intitialyears)) #and loop over the data frames in the list for(i in 1:length(intitialvars)){ Dataset <- left_join(Dataset, DatasetList[[i]]) } Dataset 
+1
source share

All Articles