From unbalanced to balanced panel

Question

From unbalanced to balanced panel

I want to fill in the missing lines inside a group, where the group is defined to be specific (id1, id2).

For example, I have a dataset with

id1 id2 year value 33 29 1990 3.5 33 29 1993 3.3 33 29 1994 3.1 32 28 1992 3.1 32 28 1993 4.5

I would like to get the following dataset

  id1 id2 year value 33 29 1990 3.5 33 29 1991 NA 33 29 1992 NA 33 29 1993 3.3 33 29 1994 3.1 32 28 1992 3.1 32 28 1993 4.5

Note that a line with year==1991,year==1992 not required to be created for the second group. The example is simplified, but the solution should work for strings / numbers, and for several columns of values instead of one.

+1

r data.table dplyr

Matthew Sep 05 '14 at 10:42

source share

2 answers

Fewer rows are possible here, but all this is done with standard data.frames (no data.table). Here is your sample data in the form of dput()

 dd <- structure(list(id1 = c(33L, 33L, 33L, 32L, 32L), id2 = c(29L, 29L, 29L, 28L, 28L), year = c(1990L, 1993L, 1994L, 1992L, 1993L ), value = c(3.5, 3.3, 3.1, 3.1, 4.5)), .Names = c("id1", "id2", "year", "value"), class = "data.frame", row.names = c(NA, -5L ))

And I will use a helper function to get rid of the ugly default motives

 unrowname <- function(x) `rownames<-`(x, NULL)

And then I transform the data with

 do.call(rbind, unname(lapply(split(dd, interaction(dd$id1, dd$id2, drop=T)), function(x) { r = seq(from=min(x$year), to=max(x$year)); cbind(unrowname(x[1,1:2]), year=r, value=x$value[match(r, x$year)]) })))

which gives

  id1 id2 year value 1 32 28 1992 3.1 2 32 28 1993 4.5 3 33 29 1990 3.5 4 33 29 1991 NA 5 33 29 1992 NA 6 33 29 1993 3.3 7 33 29 1994 3.1

until you mind row swapping, it should work fine.

0

Mrflick Sep 06 '14 at 0:35

source share

Arun · Accepted Answer · 2014-09-06T07:13:47+0000

What about?

 require(data.table) DT = data.table(id1 = c(33,33,33,32,32), id2 = c(29,29,29,28,28), year = c(1990,1993,1994,1991,1992), value = c(3.5,3.3,3.1,3.1,4.5)) setkey(DT, id1,id2,year) ans = DT[, list(year = seq.int(year[1L], year[.N])), by = list(id1,id2)] ans = DT[setkey(ans)] # id1 id2 year value # 1: 32 28 1991 3.1 # 2: 32 28 1992 4.5 # 3: 33 29 1990 3.5 # 4: 33 29 1991 NA # 5: 33 29 1992 NA # 6: 33 29 1993 3.3 # 7: 33 29 1994 3.1

From unbalanced to balanced panel

More articles: