I would like to see what people use panels in R with large data sets to work with data (for example, 50 mil + obs): the data.table package data.table useful because it has keys and is very fast. The xts package xts useful because it has the ability to execute all kinds of time series materials. Therefore, there seem to be two good options:
- have
data.table and write their own time series functions to work on it - have a list of
xts objects and run lapply on that list every time you want to do something. in the end, it will need to be combined into a data.frame to perform regressions, etc.
I am aware of the plm package, but did not find it useful for data management as the two options above. What are you guys using? Any ideas on what works best?
Let me suggest a scenario: imagine that N firms have T time periods, where N โ 0 and T โ 0. data.table will be very fast if I want to delay each company for 1 time period, for example:
x <- data.table(id=1:10, dte=rep(seq(from=as.Date("2012-01-01"), to=as.Date("2012-01-10"), by="day"), each=10), val=1:100, key=c("id", "dte")) x[,lag_val:=c(NA, head(val, -1)),by=id]
Another way to do this might be:
y <- lapply(ids, function(i) {xts(x[id==i, val], order.by=x[id == i, dte])}) y <- lapply(y, function(obj) { cbind(obj, lag(obj, 1)) })
The advantage of the first is speed with big data. The advantage of the latter is the ability to do things like period.apply and use other xts functions. Are there any tricks to create xts faster? Maybe a combination of the two? Converting from and to xts objects xts be costly.