Working with panel data in R

Question

Working with panel data in R

I would like to see what people use panels in R with large data sets to work with data (for example, 50 mil + obs): the data.table package data.table useful because it has keys and is very fast. The xts package xts useful because it has the ability to execute all kinds of time series materials. Therefore, there seem to be two good options:

have data.table and write their own time series functions to work on it
have a list of xts objects and run lapply on that list every time you want to do something. in the end, it will need to be combined into a data.frame to perform regressions, etc.

I am aware of the plm package, but did not find it useful for data management as the two options above. What are you guys using? Any ideas on what works best?

Let me suggest a scenario: imagine that N firms have T time periods, where N → 0 and T → 0. data.table will be very fast if I want to delay each company for 1 time period, for example:

 x <- data.table(id=1:10, dte=rep(seq(from=as.Date("2012-01-01"), to=as.Date("2012-01-10"), by="day"), each=10), val=1:100, key=c("id", "dte")) x[,lag_val:=c(NA, head(val, -1)),by=id]

Another way to do this might be:

 y <- lapply(ids, function(i) {xts(x[id==i, val], order.by=x[id == i, dte])}) y <- lapply(y, function(obj) { cbind(obj, lag(obj, 1)) })

The advantage of the first is speed with big data. The advantage of the latter is the ability to do things like period.apply and use other xts functions. Are there any tricks to create xts faster? Maybe a combination of the two? Converting from and to xts objects xts be costly.

+6

r zoo xts data.table

Alex Nov 02 '12 at 17:53

source share

No one has answered this question yet.

See similar questions:

eleven

Filling in missing values for groups in data.table

10

Create delayed variable in unbalanced panel data in R

1

From unbalanced to balanced panel