How to set conditions for filling certain gaps in the panel data?

I have panel data in R. data frame containing years from 1989 to 2008 for armed conflicts in different countries. However, only observations are included for countries that have experienced armed conflict in a given year.

The data set is similar to this:

df <- data.frame(c("1989","1993","1998", "1990","1995","1997"), c(rep(c(750, 135), c(3,3))), c(rep(1,6))) names(df)<-c("year","countrycode","conflict") print(df) year countrycode conflict 1 1989 750 1 2 1993 750 1 3 1998 750 1 4 1990 135 1 5 1995 135 1 6 1997 135 1 

Now I want to fill in the gaps in the panel data, BUT ONLY the gaps that do not exceed three years. For example, I want to add lines between lines 1 and 2 and between lines 5 and 7 (spacing 3 and 1 year, respectively), but neither between lines 2 and 3, nor between lines 4 and 5 (spaces are 4 years each). After this The above data frame will look like this:

 > df2 <- data.frame(c("1989","1990","1991","1992","1993","1998", + "1990","1995","1996","1997"), + c(rep(c(750, 135), c(6,4))), c(1,0,0,0,1,1,1,1,0,1)) > names(df2) <- c("year","countrycode","conflict") > print(df2) year countrycode conflict 1 1989 750 1 2 1990 750 0 3 1991 750 0 4 1992 750 0 5 1993 750 1 6 1998 750 1 7 1990 135 1 8 1995 135 1 9 1996 135 0 10 1997 135 1 

I looked at the plm package (see here ), but could not find an answer. In addition, I am relatively new to R, so I will be happy for any hint.

+4
source share
2 answers

Here's a solution using data.table . The idea is to first create data.table with only missing records (dt.rest) , and then rbind them. I wrote it in such a way that the output of each line (by copying / pasting and printing) should be quite simple. Let me know if something is unclear.

 require(data.table) dt <- data.table(df, key="countrycode") dt$year <- as.numeric(as.character(dt$year)) dt[J(unique(countrycode)), year2 := c(tail(year, -1), NA)] dt.rest <- dt[, { tt <- which(year2-year-1 <=3); list(year = unlist(lapply(tt, function(x) seq(year[x]+1, year2[x]-1, by=1))), conflict = 0) }, by=countrycode] setcolorder(dt.rest, c("year", "countrycode", "conflict")) # year countrycode conflict # 1: 1996 135 0 # 2: 1990 750 0 # 3: 1991 750 0 # 4: 1992 750 0 

Now we just need to rbind them. This is done using the rbindlist function in data.table , which binds data.frame or data.table much more efficiently than rbind .

 dt[, year2 := NULL] dt <- rbindlist(list(dt, dt.rest)) setkey(dt, "countrycode", "year") dt # year countrycode conflict # 1: 1990 135 1 # 2: 1995 135 1 # 3: 1996 135 0 # 4: 1997 135 1 # 5: 1989 750 1 # 6: 1990 750 0 # 7: 1991 750 0 # 8: 1992 750 0 # 9: 1993 750 1 # 10: 1998 750 1 
+3
source

This solution may seem dirty and difficult to digest for a beginner, but since this is a very specific and unusual problem (at least for me), I cannot come up with anything more basic.

 # Convert the `year` column to integer in case it is a factor df$year <- as.integer(as.character(df$year)) df.country <- lapply( # Split `df` by `countrycode` to make one data frame per country split(df, df$countrycode), # Apply the following function to each coutry data frame function(tab){ # Send the start and end years of each gap to the following function imputed.yr <- mapply(function(start, end) # If the gap is small enough add all values in between # otherwise just return the start and end years if(end - start < 5) start:end else c(start, end), tab$year[-nrow(tab)], tab$year[-1]) # Remove duplicate years imputed.yr <- unique(unlist(imputed.yr)) # Pack up and return a new data frame data.frame(year = imputed.yr, contrycode = tab$countrycode[1], conflict = imputed.yr %in% tab$year) }) # Paste all the imputed country specific data frames together do.call(rbind, df.country) 

The above code outputs the following result, which essentially matches the query.

  year contrycode conflict 135.1 1990 135 TRUE 135.2 1995 135 TRUE 135.3 1996 135 FALSE 135.4 1997 135 TRUE 750.1 1989 750 TRUE 750.2 1990 750 FALSE 750.3 1991 750 FALSE 750.4 1992 750 FALSE 750.5 1993 750 TRUE 750.6 1998 750 TRUE 
+2
source

All Articles