Step back in plm

This is a very simple question, but I could not find a definitive answer, so I thought I would ask about it. I use the plm package to work with panel data. I am trying to use the lag function to delay the FORWARD variable in time (by default I need to get the value from the previous period, and I want the value from NEXT). I found some old articles / questions (circa 2009) suggesting that this is possible using k=-1 as an argument. However, when I try to do this, I get an error message.

Code example:

 library(plm) df<-as.data.frame(matrix(c(1,1,1,2,2,3,20101231,20111231,20121231,20111231,20121231,20121231,50,60,70,120,130,210),nrow=6,ncol=3)) names(df)<-c("individual","date","data") df$date<-as.Date(as.character(df$date),format="%Y%m%d") df.plm<-pdata.frame(df,index=c("individual","date")) 

Backlog:

 lag(df.plm$data,0) ##returns 1-2010-12-31 1-2011-12-31 1-2012-12-31 2-2011-12-31 2-2012-12-31 3-2012-12-31 50 60 70 120 130 210 lag(df.plm$data,1) ##returns 1-2010-12-31 1-2011-12-31 1-2012-12-31 2-2011-12-31 2-2012-12-31 3-2012-12-31 NA 50 60 NA 120 NA lag(df.plm$data,-1) ##returns Error in rep(1, ak) : invalid 'times' argument 

I also read that plm.data replaced pdata.frame for some applications in plm . However, plm.data does not work with the lag function at all:

 df.plm<-plm.data(df,indexes=c("individual","date")) lag(df.plm$data,1) ##returns [1] 50 60 70 120 130 210 attr(,"tsp") [1] 0 5 1 

I would be grateful for any help. If anyone has another suggestion for a package to use, I am all ears. However, I really love plm , because it automatically handles the lag of several individuals and skips the gaps in the time series.

+8
r lag plm
source share
2 answers

EDIT2 : forward lag (= leading values) implemented in plm CRAN releases> = 1.6-4. Functions are lead() or lag() (the latter with a negative integer for leading values).

Take care of any other attached packages that use the same function names. Of course, you can access the function through the full namespace, for example, plm::lead .

Examples from ?plm::lead :

 # First, create a pdata.frame data("EmplUK", package = "plm") Em <- pdata.frame(EmplUK) # Then extract a series, which becomes additionally a pseries z <- Em$output class(z) # compute negative lags (= leading values) lag(z, -1) lead(z, 1) # same as line above identical(lead(z, 1), lag(z, -1)) # TRUE 
+2
source share

I had the same problem and could not find a good solution in plm or any other package. ddply was tempting (for example, s5 = ddply(df, .(country,year), transform, lag=lag(df[, "value-to-lag"], lag=3)) ), but I could not force NA in my lagging column is right for lags other than one.

I wrote a brute force solution that iterates over a row of data row by row and populates the lagging column with the appropriate value. It's terribly slow (437.33s for my 13000x130 DataFrame versus 0.012s for turning it into pdata.frame and using lag ), but it did the job for me. I thought I would share this because I could not find much information elsewhere on the Internet.

In the function below:

  • df is your data file. The function returns df with a new column containing direct values.
  • group is the column name of the grouping variable for your panel data. For example, I had longitudinal data for several countries, and I used "Country.Name" here.
  • x is the column from which you want to generate lagging values, e.g. "GDP"
  • forwardx is the (new) column that will contain the front lags, for example. "GDP.next.year".
  • lag - the number of periods in the future. For example, if your data was taken at annual intervals, using lag=5 would set forwardx to x after five years.

.

 add_forward_lag <- function(df, group, x, forwardx, lag) { for (i in 1:(nrow(df)-lag)) { if (as.character(df[i, group]) == as.character(df[i+lag, group])) { # put forward observation in forwardx df[i, forwardx] <- df[i+lag, x] } else { # end of group, no forward observation df[i, forwardx] <- NA } } # last elem(s) in forwardx are NA for (j in ((nrow(df)-lag+1):nrow(df))) { df[j, forwardx] <- NA } return(df) } 

See sample output using the DNase built-in dataset. This doesn't make sense in the context of the dataset, but lets you see what the columns do.

 require(DNase) add_forward_lag(DNase, "Run", "density", "lagged_density",3) Grouped Data: density ~ conc | Run Run conc density lagged_density 1 1 0.04882812 0.017 0.124 2 1 0.04882812 0.018 0.206 3 1 0.19531250 0.121 0.215 4 1 0.19531250 0.124 0.377 5 1 0.39062500 0.206 0.374 6 1 0.39062500 0.215 0.614 7 1 0.78125000 0.377 0.609 8 1 0.78125000 0.374 1.019 9 1 1.56250000 0.614 1.001 10 1 1.56250000 0.609 1.334 11 1 3.12500000 1.019 1.364 12 1 3.12500000 1.001 1.730 13 1 6.25000000 1.334 1.710 14 1 6.25000000 1.364 NA 15 1 12.50000000 1.730 NA 16 1 12.50000000 1.710 NA 17 2 0.04882812 0.045 0.123 18 2 0.04882812 0.050 0.225 19 2 0.19531250 0.137 0.207 

Given how long it takes, you can use a different approach: lag behind all other variables.

0
source share

All Articles