Expand ranges defined by from and to columns

I have a data frame containing the "name" of the US presidents, the years when they start and end in the office (columns "from" and "to" ). Here is an example:

 name from to Bill Clinton 1993 2001 George W. Bush 2001 2009 Barack Obama 2009 2012 

... and output from dput :

 dput(tail(presidents, 3)) structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama" ), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", "from", "to"), row.names = 42:44, class = "data.frame") 

I want to create a data frame with two columns ( "name" and "year" ), with each row for each year when the president was in office. So I need to create a regular sequence every year from " from " to "to" . Here I was expecting:

 name year Bill Clinton 1993 Bill Clinton 1994 ... Bill Clinton 2000 Bill Clinton 2001 George W. Bush 2001 George W. Bush 2002 ... George W. Bush 2008 George W. Bush 2009 Barack Obama 2009 Barack Obama 2010 Barack Obama 2011 Barack Obama 2012 

I know that I can use data.frame(name = "Bill Clinton", year = seq(1993, 2001)) to expand opportunities for one president, but I cannot understand how iterations for each president.

How can I do it? I feel like I should know, but I draw a space.

Update 1

Ok, I tried both solutions and I get an error:

 foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame") ddply(foo, "name", summarise, year = seq(from, to)) Error in seq.default(from, to) : 'from' must be of length 1 
+13
r dataframe
source share
8 answers

You can use plyr package:

 library(plyr) ddply(presidents, "name", summarise, year = seq(from, to)) # name year # 1 Barack Obama 2009 # 2 Barack Obama 2010 # 3 Barack Obama 2011 # 4 Barack Obama 2012 # 5 Bill Clinton 1993 # 6 Bill Clinton 1994 # [...] 

and if it’s important to sort the data by year, you can use the arrange function:

 df <- ddply(presidents, "name", summarise, year = seq(from, to)) arrange(df, df$year) # name year # 1 Bill Clinton 1993 # 2 Bill Clinton 1994 # 3 Bill Clinton 1995 # [...] # 21 Barack Obama 2011 # 22 Barack Obama 2012 

Edit 1: Following @edgester's β€œUpdate 1”, a more suitable approach is to use adply to account for presidents with inconsistent terms:

 adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")] 
+12
source share

Here is a data.table solution. It has a nice (if insignificant) feature, leaving the presidents in their ordered order:

 library(data.table) dt <- data.table(presidents) dt[, list(year = seq(from, to)), by = name] # name year # 1: Bill Clinton 1993 # 2: Bill Clinton 1994 # ... # ... # 21: Barack Obama 2011 # 22: Barack Obama 2012 

Edit: To handle presidents with inconsistent terms, use this instead:

 dt[, list(year = seq(from, to)), by = c("name", "from")] 
+13
source share

Here is a dplyr solution:

 library(dplyr) # the data presidents <- structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama" ), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", "from", "to"), row.names = 42:44, class = "data.frame") # the expansion of the table presidents %>% rowwise() %>% do(data.frame(name = .$name, year = seq(.$from, .$to, by = 1))) # the output Source: local data frame [22 x 2] Groups: <by row> name year (chr) (dbl) 1 Bill Clinton 1993 2 Bill Clinton 1994 3 Bill Clinton 1995 4 Bill Clinton 1996 5 Bill Clinton 1997 6 Bill Clinton 1998 7 Bill Clinton 1999 8 Bill Clinton 2000 9 Bill Clinton 2001 10 George W. Bush 2001 .. ... ... 

h / t: https://stackoverflow.com/a/316877/

+5
source share

Another base solution:

 l <- mapply(`:`, d$from, d$to) data.frame(name = d$name[rep(1:nrow(d), lengths(l))], year = unlist(l)) # name year # 1 Bill Clinton 1993 # 2 Bill Clinton 1994 # ...snip # 8 Bill Clinton 2000 # 9 Bill Clinton 2001 # 10 George W. Bush 2001 # 11 George W. Bush 2002 # ...snip # 17 George W. Bush 2008 # 18 George W. Bush 2009 # 19 Barack Obama 2009 # 20 Barack Obama 2010 # 21 Barack Obama 2011 # 22 Barack Obama 2012 
+2
source share

Here is a quick base- R solution where Df is your data.frame :

 do.call(rbind, apply(Df, 1, function(x) { data.frame(name=x[1], year=seq(x[2], x[3]))})) 

It gives some warnings about row names, but it seems to return the correct data.frame .

+1
source share

Another use tidyverse could be to gather data in a long format, group_by name and create a sequence between from and to present.

 library(tidyverse) presidents %>% gather(key, date, -name) %>% group_by(name) %>% complete(date = seq(date[1], date[2]))%>% select(-key) # A tibble: 22 x 2 # Groups: name [3] # name date # <chr> <dbl> # 1 Barack Obama 2009 # 2 Barack Obama 2010 # 3 Barack Obama 2011 # 4 Barack Obama 2012 # 5 Bill Clinton 1993 # 6 Bill Clinton 1994 # 7 Bill Clinton 1995 # 8 Bill Clinton 1996 # 9 Bill Clinton 1997 #10 Bill Clinton 1998 # … with 12 more rows 
0
source share

Alternative tidyverse approach using unnest and map2 .

 library(tidyverse) presidents %>% unnest(year = map2(from, to, seq)) %>% select(-from, -to) # name year # 1 Bill Clinton 1993 # 2 Bill Clinton 1994 ... # 21 Barack Obama 2011 # 22 Barack Obama 2012 
0
source share

Use by to create with by list L of data.frames, one data.frame for the president, and then rbind them together. Packages are not used.

 L <- by(presidents, presidents$name, with, data.frame(name, year = from:to)) do.call("rbind", setNames(L, NULL)) 

If you don't mind line names, then the last line can be reduced to:

 do.call("rbind", L) 
0
source share

All Articles