How to subtract the first record from the last record in grouped data

I would appreciate help in the following task: from the data frame below ( C ) for each identifier, I would like to subtract the first record in column d_2 from the final record, and then save the results to another information frame containing the same identifiers. Then I can combine this with my original data framework. Note that the subtraction must be in that order (last record minus first record for each id ).

Here are the codes:

 id <- c("A1", "A1", "B10","B10", "B500", "B500", "C100", "C100", "C100", "D40", "D40", "G100", "G100") d_1 <- c( rep(1.15, 2), rep(1.44, 2), rep(1.34, 2), rep(1.50, 3), rep(1.90, 2), rep(1.59, 2)) set.seed(2) d_2 <- round(runif(13, -1, 1), 2) C <- data.frame(id, d_1, d_2) id d_1 d_2 A1 1.15 -0.63 A1 1.15 0.40 B10 1.44 0.15 B10 1.44 -0.66 B500 1.34 0.89 B500 1.34 0.89 C100 1.50 -0.74 C100 1.50 0.67 C100 1.50 -0.06 D40 1.90 0.10 D40 1.90 0.11 G100 1.59 -0.52 G100 1.59 0.52 

Desired Result:

 id2 <- c("A1", "B10", "B500", "C100", "D40", "G100") difference <- c(1.03, -0.81, 0, 0.68, 0.01, 1.04) diff_df <- data.frame(id2, difference) id2 difference A1 1.03 B10 -0.81 B500 0.00 C100 0.68 D40 0.01 G100 1.04 

I tried to do this using ddply to get the first and last entries, but I'm really afraid to index the “function argument” in the second code (below) to get the desired result.

 C_1 <- ddply(C, .(id), function(x) x[c(1, nrow(x)), ]) ddply(C_1, .(patient), function ) 

Honestly, I am not very familiar with the ddply package - I got the code above from another stack exchange message .

My source data is groupedData, and I believe that another way to approach this is to use gapply , but again I am struggling with the third argument here (usually a function)

 grouped_C <- groupedData(d_1 ~ d_2 | id, data = C, FUN = mean, labels = list( x = "", y = ""), units = list("")) x1 <- gapply(grouped_C, "d_2", first_entry) x2 <- gapply(grouped_C, "d_2", last_entry) 

where first_entry and last_entry are functions that will help me get the first and last records. Then I can get the difference: x2 - x1 . However, I'm not sure what to enter as first_entry and last_entry in the above codes (perhaps with a head or tail?).

Any help would be greatly appreciated.

+6
source share
2 answers

This is easy to do with dplyr . The last and first functions are very useful for this task.

 library(dplyr) #install the package dplyr and load it into library diff_df <- C %>% #create a new data.frame (diff_df) and store the output of the following operation in it. The %.% operator is used to chain several operations together but you dont have to reference the data.frame you are using each time. so here we are using your data.frame C for the following steps group_by(id) %>% #group the whole data.frame C by id summarize(difference = last(d_2)-first(d_2)) #for each group of id, create a single line summary where the first entry of d_2 (for that group) is subtracted from the last entry of d_2 for that group # id difference #this is the result stored in diff_df #1 A1 1.03 #2 B10 -0.81 #3 B500 0.00 #4 C100 0.68 #5 D40 0.01 #6 G100 1.04 

Edit note: updated post with %>% instead of %.% , Which is deprecated.

+9
source

If you have singlets and need to be left alone, this will solve your problem. This is the same as docendo discimus answer, but with an if-else component to handle singleton cases:

 library(dplyr) diff_df <- C %>% group_by(id) %>% summarize(difference = if(n() > 1) last(d_2) - first(d_2) else d_2) 
+1
source

All Articles