Retrieve the last row for each object from a data frame

I have such a data frame in R. I would like to extract the last visit for each subject.

SUBJID VISIT

40161 3 40161 4 40161 5 40161 6 40161 9 40201 3 40202 6 40202 8 40241 3 40241 4 

The required output is as follows

SUBJID VISIT

  40161 9 40201 3 40202 8 

How do I do this in R? Many thanks for your help.

+6
source share
7 answers

While agstudy is true, there is another way with statistics package and aggregate function.

 df <- read.table(text="SUBJID VISIT 40161 3 40161 4 40161 5 40161 6 40161 9 40201 3 40202 6 40202 8 40241 3 40241 4", header=TRUE) aggregate(VISIT ~ SUBJID, df, max) SUBJID VISIT 1 40161 9 2 40201 3 3 40202 8 4 40241 4 
+6
source

To show another alternative, because I like the simplicity of its syntax, you can use data.table too. Assuming your data.frame is called "df":

 library(data.table) # data.table 1.8.7 For help type: help("data.table") DT <- data.table(df, key = "SUBJID") DT[, list(VISIT = max(VISIT)), by = key(DT)] # SUBJID V1 # 1: 40161 9 # 2: 40201 3 # 3: 40202 8 # 4: 40241 4 

And, although we share many ways to do this in R, if you are comfortable with the SQL syntax, you can also use sqldf as follows:

 library(sqldf) sqldf("select SUBJID, max(VISIT) `VISIT` from df group by SUBJID") SUBJID VISIT 1 40161 9 2 40201 3 3 40202 8 4 40241 4 
+4
source

Because we can, another basic option:

  do.call(rbind, lapply(split(dat, dat$SUBJID), function(x) tail(x$VISIT, 1) ) ) # [,1] #40161 9 #40201 3 #40202 8 #40241 4 

EDIT

As @BenBolker suggests:

  do.call(rbind, lapply(split(dat, dat$SUBJID), function(x) tail(x, 1) ) ) 

should work for all columns if you have more.

+3
source

Using plyr package for example:

  ddply(dat,.(SUBJID),summarise,VISIT=tail(VISIT,1)) SUBJID VISIT 1 40161 9 2 40201 3 3 40202 8 4 40241 4 

Where is it:

 dat <- read.table(text ='SUBJID VISIT 40161 3 40161 4 40161 5 40161 6 40161 9 40201 3 40202 6 40202 8 40241 3 40241 4',head=T) 
+1
source

Here's a simple solution with diff :

 dat[c(diff(dat$SUBJID) != 0, TRUE), ] SUBJID VISIT 5 40161 9 6 40201 3 8 40202 8 10 40241 4 

This is also possible with by :

 do.call(rbind, by(dat, dat$SUBJID, tail, 1)) SUBJID VISIT 40161 40161 9 40201 40201 3 40202 40202 8 40241 40241 4 
+1
source

Alternately (with @agstudy data),

 g <- grouping(df$SUBJID) df[g[attr(g, "ends")],] SUBJID VISIT 5 40161 9 6 40201 3 8 40202 8 10 40241 4 

or with data.table

 library(data.table) unique(setDT(df), by="SUBJID", fromLast=TRUE) SUBJID VISIT 1: 40161 9 2: 40201 3 3: 40202 8 4: 40241 4 
0
source

It could also be using sqldf package, library (sqldf)

 sqldf("SELECT SUBJID, MAX(VISIT) From df GROUP BY by SUBJID") SUBJID VISIT 1 40161 9 2 40201 3 3 40202 8 4 40241 4 
0
source

All Articles