Retrieve the last row for each object from a data frame

Question

Retrieve the last row for each object from a data frame

I have such a data frame in R. I would like to extract the last visit for each subject.

SUBJID VISIT

40161 3 40161 4 40161 5 40161 6 40161 9 40201 3 40202 6 40202 8 40241 3 40241 4

The required output is as follows

SUBJID VISIT

  40161 9 40201 3 40202 8

How do I do this in R? Many thanks for your help.

+6

r

user2077677 Feb 16 '13 at 5:27

source share

7 answers

N8TRO · Answer 1 · 2013-02-16T05:58:28+0000

While agstudy is true, there is another way with statistics package and aggregate function.

 df <- read.table(text="SUBJID VISIT 40161 3 40161 4 40161 5 40161 6 40161 9 40201 3 40202 6 40202 8 40241 3 40241 4", header=TRUE) aggregate(VISIT ~ SUBJID, df, max) SUBJID VISIT 1 40161 9 2 40201 3 3 40202 8 4 40241 4

A5C1D2H2I1M1N2O1R2T1 · Answer 2 · 2013-02-16T06:40:48+0000

To show another alternative, because I like the simplicity of its syntax, you can use data.table too. Assuming your data.frame is called "df":

 library(data.table) # data.table 1.8.7 For help type: help("data.table") DT <- data.table(df, key = "SUBJID") DT[, list(VISIT = max(VISIT)), by = key(DT)] # SUBJID V1 # 1: 40161 9 # 2: 40201 3 # 3: 40202 8 # 4: 40241 4

And, although we share many ways to do this in R, if you are comfortable with the SQL syntax, you can also use sqldf as follows:

 library(sqldf) sqldf("select SUBJID, max(VISIT) `VISIT` from df group by SUBJID") SUBJID VISIT 1 40161 9 2 40201 3 3 40202 8 4 40241 4

user1317221_G · Answer 3 · 2013-02-16T11:42:17+0000

Because we can, another basic option:

  do.call(rbind, lapply(split(dat, dat$SUBJID), function(x) tail(x$VISIT, 1) ) ) # [,1] #40161 9 #40201 3 #40202 8 #40241 4

EDIT

As @BenBolker suggests:

  do.call(rbind, lapply(split(dat, dat$SUBJID), function(x) tail(x, 1) ) )

should work for all columns if you have more.

agstudy · Answer 4 · 2013-02-16T05:36:43+0000

Using plyr package for example:

  ddply(dat,.(SUBJID),summarise,VISIT=tail(VISIT,1)) SUBJID VISIT 1 40161 9 2 40201 3 3 40202 8 4 40241 4

Where is it:

 dat <- read.table(text ='SUBJID VISIT 40161 3 40161 4 40161 5 40161 6 40161 9 40201 3 40202 6 40202 8 40241 3 40241 4',head=T)

Sven hohenstein · Answer 5 · 2013-02-16T13:58:38+0000

Here's a simple solution with diff :

 dat[c(diff(dat$SUBJID) != 0, TRUE), ] SUBJID VISIT 5 40161 9 6 40201 3 8 40202 8 10 40241 4

This is also possible with by :

 do.call(rbind, by(dat, dat$SUBJID, tail, 1)) SUBJID VISIT 40161 40161 9 40201 40201 3 40202 40202 8 40241 40241 4

Frank · Answer 6 · 2019-06-11T19:50:21+0000

Alternately (with @agstudy data),

 g <- grouping(df$SUBJID) df[g[attr(g, "ends")],] SUBJID VISIT 5 40161 9 6 40201 3 8 40202 8 10 40241 4

or with data.table

 library(data.table) unique(setDT(df), by="SUBJID", fromLast=TRUE) SUBJID VISIT 1: 40161 9 2: 40201 3 3: 40202 8 4: 40241 4

Jason mathews · Answer 7 · 2019-06-13T17:13:59+0000

It could also be using sqldf package, library (sqldf)

 sqldf("SELECT SUBJID, MAX(VISIT) From df GROUP BY by SUBJID") SUBJID VISIT 1 40161 9 2 40201 3 3 40202 8 4 40241 4

Retrieve the last row for each object from a data frame

More articles: