How to multiply the last 12 months of data for each identifier in a data frame?

I have a data frame representing 15 years of monitoring several hundred patients. I want to create a subset of a data frame, including the last 12 months of data for each patient.

The following is a representative example of my data (including one missing value as there is no missing data in my actual dataset):

# Create example dataset.
example.dat <- data.frame(
  ID = c(1,1,1,1,2,2,2,3,3,3), # patient ID numbers
  Date = as.Date(c("2000-02-01", "2004-10-21", "2005-02-06", # follow-up dates
                   "2005-06-14", "2002-11-24", "2009-03-05",
                   "2009-07-20", "2005-09-02", "2006-01-15",
                   "2006-05-18")),
  Cat = c("Yes", "Yes", "No", "Yes", "No", # responses to a categorical variable
          "Yes", "Yes", NA,   "No", "No")
  )

example.dat

Which gives the following result:

   ID       Date  Cat
1   1 2000-02-01  Yes
2   1 2004-10-21  Yes
3   1 2005-02-06   No
4   1 2005-06-14  Yes
5   2 2002-11-24   No
6   2 2009-03-05  Yes
7   2 2009-07-20  Yes
8   3 2005-09-02 <NA>
9   3 2006-01-15   No
10  3 2006-05-18   No

I need to figure out how a subset, for each ID number, the most recent record and all records for the previous 12 months.

   ID       Date  Cat
2   1 2004-10-21  Yes
3   1 2005-02-06   No
4   1 2005-06-14  Yes
6   2 2009-03-05  Yes
7   2 2009-07-20  Yes
8   3 2005-09-02 <NA>
9   3 2006-01-15   No
10  3 2006-05-18   No

R, , (( ) - ( )).

+4
3

. ave , "Date" ave, "Date". ave 0/1, !! FALSE/TRUE.

 in_last_yr <- function(x) {
    max_date <- as.Date(max(x), "1970-01-01")
    x > seq(max_date, length = 2, by = "-1 year")[2]
 }
 subset(example.dat, !!ave(as.numeric(Date), ID, FUN = in_last_yr))

, .

+3

dplyr

library(dplyr)

example.dat %>% group_by(ID) %>% filter(Date >= max(Date)-365)

#Source: local data frame [8 x 3]
#Groups: ID
#
#  ID       Date Cat
#1  1 2004-10-21 Yes
#2  1 2005-02-06  No
#3  1 2005-06-14 Yes
#4  2 2009-03-05 Yes
#5  2 2009-07-20 Yes
#6  3 2005-09-02  NA
#7  3 2006-01-15  No
#8  3 2006-05-18  No
+2

, data.table, , . , lubridate 12 .

data.table docendo discimus 'dplyr answer. , lubridate , 365 12 OP, :

library(data.table)
library(lubridate)
setDT(example.dat)[, .SD[Date >= max(Date) %m-% years(1)], by = ID]
   ID       Date Cat
1:  1 2004-10-21 Yes
2:  1 2005-02-06  No
3:  1 2005-06-14 Yes
4:  2 2009-03-05 Yes
5:  2 2009-07-20 Yes
6:  3 2005-09-02  NA
7:  3 2006-01-15  No
8:  3 2006-05-18  No

v1.9.8 ( CRAN 25 . 2016), data.table :

library(data.table)
library(lubridate)
mDT <- setDT(example.dat)[, max(Date) %m-% years(1), by = ID]
example.dat[example.dat[mDT, on = .(ID, Date >= V1), which = TRUE]]
   ID       Date Cat
1:  1 2004-10-21 Yes
2:  1 2005-02-06  No
3:  1 2005-06-14 Yes
4:  2 2009-03-05 Yes
5:  2 2009-07-20 Yes
6:  3 2005-09-02  NA
7:  3 2006-01-15  No
8:  3 2006-05-18  No

mDT 12- ID:

   ID         V1
1:  1 2004-06-14
2:  2 2008-07-20
3:  3 2005-05-18

,

example.dat[mDT, on = .(ID, Date >= V1), which = TRUE]
[1]  2  3  4  6  7  8  9 10

example.dat.

, , 12 :

, :

library(data.table)
library(lubridate)
mseq <- Vectorize(function(x) seq(x, length = 2L, by = "-1 year")[2L])
data.table(Date = as.Date("2016-02-28") + 0:2)[
  , minus_365d := Date -365][
    , minus_1yr := Date - years()][
      , minus_1yr_m := Date %m-% years()][
        , seq.Date := as_date(mseq(Date))][]
         Date minus_365d  minus_1yr minus_1yr_m   seq.Date
1: 2016-02-28 2015-02-28 2015-02-28  2015-02-28 2015-02-28
2: 2016-02-29 2015-03-01       <NA>  2015-02-28 2015-03-01
3: 2016-03-01 2015-03-02 2015-03-01  2015-03-01 2015-03-01
  • no , ( 1).
  • , 365 12 ( 3), 366 .
  • If the key date is the leap date, the approach seq.Date()chooses the next day, March 1, 2015, since there is no February 29 in 2015. Using lubridate %m-%translates the date on the last day of February, February 28, 2015.
+2
source

All Articles