R: How to merge some columns while keeping other columns

I have a similar problem described here , but none of the solutions I tried work.

For a table like this:

Date Exercise Category Weight Reps EstMax RepxWeight Note 4/2/16 Deadlift Legs 135 7 166.4685 7x135 easy 4/2/16 Deadlift Legs 135 7 166.4685 7x135 kinda easy 4/2/16 Deadlift Legs 135 7 166.4685 7x135 tired 4/2/16 Bench Press Chest 95 5 110.8175 5x95 hard 4/2/16 Bench Press Chest 135 2 143.991 2x135 not hard 4/9/16 Bench Press Chest 135 2 143.991 2x135 a little hard 4/9/16 Bench Press Chest 135 2 143.991 2x135 super tired 4/18/16 Deadlift Legs 155 8 196.292 8x155 … 4/18/16 Deadlift Legs 155 5 180.8075 5x155 bad day 5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day 5/8/16 Deadlift Legs 185 3 203.4815 3x185 felt easy 5/8/16 Bench Press Chest 115 4 130.318 4x115 easy 5/8/16 Bench Press Chest 115 4 130.318 4x115 hard 

I want aggregate to get rows with the max value for a specific column (e.g. EstMax ) based on several other columns (e.g. Date and Exercise ), but also keep all other columns in the row. And in the case of several records with the same maximum value, make the first record.

The expected result will be as follows:

 Date Exercise Category Weight Reps EstMax RepxWeight Note 4/2/16 Deadlift Legs 135 7 166.4685 7x135 easy 4/2/16 Bench Press Chest 135 2 143.991 2x135 not hard 4/9/16 Bench Press Chest 135 2 143.991 2x135 a little hard 4/18/16 Deadlift Legs 155 8 196.292 8x155 … 5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day 5/8/16 Bench Press Chest 115 4 130.318 4x115 hard 

Examples of some methods I've tried; in each case, the "extra columns" are ultimately used as factors for aggregation, which I don't want.

 data <- structure(list(Date = structure(c(2L, 2L, 2L, 2L, 2L, 3L, 3L, 1L, 1L, 4L, 4L, 4L, 4L), .Label = c("4/18/16", "4/2/16", "4/9/16", "5/8/16"), class = "factor"), Exercise = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("Bench Press", "Deadlift"), class = "factor"), Category = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("Chest", "Legs"), class = "factor"), Weight = c(135L, 135L, 135L, 95L, 135L, 135L, 135L, 155L, 155L, 185L, 185L, 115L, 115L), Reps = c(7L, 7L, 7L, 5L, 2L, 2L, 2L, 8L, 5L, 3L, 3L, 4L, 4L), EstMax = c(166.4685, 166.4685, 166.4685, 110.8175, 143.991, 143.991, 143.991, 196.292, 180.8075, 203.4815, 203.4815, 130.318, 130.318), RepxWeight = structure(c(6L, 6L, 6L, 5L, 1L, 1L, 1L, 7L, 4L, 2L, 2L, 3L, 3L), .Label = c("2x135", "3x185", "4x115", "5x155", "5x95", "7x135", "8x155"), class = "factor"), Note = structure(c(4L, 8L, 11L, 7L, 9L, 2L, 10L, 1L, 3L, 6L, 5L, 4L, 7L), .Label = c("…", "a little hard", "bad day", "easy", "felt easy", "good day", "hard", "kinda easy", "not hard", "super tired", "tired"), class = "factor")), .Names = c("Date", "Exercise", "Category", "Weight", "Reps", "EstMax", "RepxWeight", "Note"), class = "data.frame", row.names = c(NA, -13L)) # base R aggregate(EstMax ~ Date + Exercise, data = data, FUN = max) # Date Exercise EstMax # 1 4/2/16 Bench Press 143.9910 # 2 4/9/16 Bench Press 143.9910 # 3 5/8/16 Bench Press 130.3180 # 4 4/18/16 Deadlift 196.2920 # 5 4/2/16 Deadlift 166.4685 # 6 5/8/16 Deadlift 203.4815 aggregate(EstMax ~ Date + Exercise + RepxWeight + Note, data = data, FUN = max) # Date Exercise RepxWeight Note EstMax # 1 4/18/16 Deadlift 8x155 … 196.2920 # 2 4/9/16 Bench Press 2x135 a little hard 143.9910 # 3 4/18/16 Deadlift 5x155 bad day 180.8075 # 4 5/8/16 Bench Press 4x115 easy 130.3180 # 5 4/2/16 Deadlift 7x135 easy 166.4685 # 6 5/8/16 Deadlift 3x185 felt easy 203.4815 # 7 5/8/16 Deadlift 3x185 good day 203.4815 # 8 5/8/16 Bench Press 4x115 hard 130.3180 # 9 4/2/16 Bench Press 5x95 hard 110.8175 # 10 4/2/16 Deadlift 7x135 kinda easy 166.4685 # 11 4/2/16 Bench Press 2x135 not hard 143.9910 # 12 4/9/16 Bench Press 2x135 super tired 143.9910 # 13 4/2/16 Deadlift 7x135 tired 166.4685 # data table library("data.table") data_dt <- data.table(data) data_dt[ , max(EstMax), by = c("Date", "Exercise")] # Date Exercise V1 # 1: 4/2/16 Deadlift 166.4685 # 2: 4/2/16 Bench Press 143.9910 # 3: 4/9/16 Bench Press 143.9910 # 4: 4/18/16 Deadlift 196.2920 # 5: 5/8/16 Deadlift 203.4815 # 6: 5/8/16 Bench Press 130.3180 data_dt[, max(EstMax), .(Date, Exercise, Weight, Reps, RepxWeight, Note)] # Date Exercise Weight Reps RepxWeight Note V1 # 1: 4/2/16 Deadlift 135 7 7x135 easy 166.4685 # 2: 4/2/16 Deadlift 135 7 7x135 kinda easy 166.4685 # 3: 4/2/16 Deadlift 135 7 7x135 tired 166.4685 # 4: 4/2/16 Bench Press 95 5 5x95 hard 110.8175 # 5: 4/2/16 Bench Press 135 2 2x135 not hard 143.9910 # 6: 4/9/16 Bench Press 135 2 2x135 a little hard 143.9910 # 7: 4/9/16 Bench Press 135 2 2x135 super tired 143.9910 # 8: 4/18/16 Deadlift 155 8 8x155 … 196.2920 # 9: 4/18/16 Deadlift 155 5 5x155 bad day 180.8075 # 10: 5/8/16 Deadlift 185 3 3x185 good day 203.4815 # 11: 5/8/16 Deadlift 185 3 3x185 felt easy 203.4815 # 12: 5/8/16 Bench Press 115 4 4x115 easy 130.3180 # 13: 5/8/16 Bench Press 115 4 4x115 hard 130.3180 

Basic R solutions are particularly preferred. Also saw the function which.max() , which could be useful, but could not figure out how to apply it to this.

Other related issues that I looked at but did not solve were:

Adding a non-aggregated column to an aggregated dataset based on aggregation of another column

Maintain a minimum value for each factor level

How to select the row with the maximum value in each group

join multiple columns in data.table

How to aggregate some columns while saving other columns in R?

+1
source share
4 answers

I understand that you are looking for a basic R solution, but meanwhile here is dplyr one:

 library(dplyr) data %>% group_by(Date, Exercise) %>% slice(which.max(EstMax)) # # A tibble: 6 x 8 # # Groups: Date, Exercise [6] # Date Exercise Category Weight Reps EstMax RepxWeight Note # <fctr> <fctr> <fctr> <int> <int> <dbl> <fctr> <fctr> # 1 4/18/16 Deadlift Legs 155 8 196.2920 8x155 … # 2 4/2/16 Bench Press Chest 135 2 143.9910 2x135 not hard # 3 4/2/16 Deadlift Legs 135 7 166.4685 7x135 easy # 4 4/9/16 Bench Press Chest 135 2 143.9910 2x135 a little hard # 5 5/8/16 Bench Press Chest 115 4 130.3180 4x115 easy # 6 5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day 

Edit

data.table not my strength, but for the sake of completeness, here is my attempt:

 library(data.table) setDT(data)[, .SD[which.max(EstMax)], by = .(Date, Exercise)] # Date Exercise Category Weight Reps EstMax RepxWeight Note # 1: 4/2/16 Deadlift Legs 135 7 166.4685 7x135 easy # 2: 4/2/16 Bench Press Chest 135 2 143.9910 2x135 not hard # 3: 4/9/16 Bench Press Chest 135 2 143.9910 2x135 a little hard # 4: 4/18/16 Deadlift Legs 155 8 196.2920 8x155 … # 5: 5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day # 6: 5/8/16 Bench Press Chest 115 4 130.3180 4x115 easy 
+4
source

Here is another method with dplyr :

 library(dplyr) library(lubridate) data %>% mutate(Date = mdy(Date)) %>% group_by(Date, Exercise) %>% arrange(desc(EstMax)) %>% slice(1) 

Result:

 # A tibble: 6 x 8 # Groups: Date, Exercise [6] Date Exercise Category Weight Reps EstMax RepxWeight Note <date> <fctr> <fctr> <int> <int> <dbl> <fctr> <fctr> 1 2016-04-02 Bench Press Chest 135 2 143.9910 2x135 not hard 2 2016-04-02 Deadlift Legs 135 7 166.4685 7x135 easy 3 2016-04-09 Bench Press Chest 135 2 143.9910 2x135 a little hard 4 2016-04-18 Deadlift Legs 155 8 196.2920 8x155 … 5 2016-05-08 Bench Press Chest 115 4 130.3180 4x115 easy 6 2016-05-08 Deadlift Legs 185 3 203.4815 3x185 good day 

or you can also use sqldf :

 library(sqldf) library(lubridate) data$Date = mdy(data$Date) sqldf("select *, max(EstMax) as EstMax2 from data group by Date, Exercise order by Date, Exercise") 

Result:

  Date Exercise Category Weight Reps EstMax RepxWeight Note EstMax2 1 2016-04-02 Bench Press Chest 135 2 143.9910 2x135 not hard 143.9910 2 2016-04-02 Deadlift Legs 135 7 166.4685 7x135 easy 166.4685 3 2016-04-09 Bench Press Chest 135 2 143.9910 2x135 a little hard 143.9910 4 2016-04-18 Deadlift Legs 155 8 196.2920 8x155 … 196.2920 5 2016-05-08 Bench Press Chest 115 4 130.3180 4x115 easy 130.3180 6 2016-05-08 Deadlift Legs 185 3 203.4815 3x185 good day 203.4815 
+2
source

One (not so correct) method that is stored in order to show one problem with summing all numeric columns independently:

 grpvar <- c("Date", "Exercise", "Category") merge( aggregate(data[,c("Weight", "Reps", "EstMax")], by = data[grpvar], FUN = max), aggregate(data[,c("RepxWeight", "Note")], by = data[grpvar], FUN = function(a) a[1]), by = grpvar ) # Date Exercise Category Weight Reps EstMax RepxWeight Note # 1 4/18/16 Deadlift Legs 155 8 196.2920 8x155 ... # 2 4/2/16 Bench Press Chest 135 5 143.9910 5x95 hard # 3 4/2/16 Deadlift Legs 135 7 166.4685 7x135 easy # 4 4/9/16 Bench Press Chest 135 2 143.9910 2x135 a little hard # 5 5/8/16 Bench Press Chest 115 4 130.3180 4x115 easy # 6 5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day 

On 4/2/16 , your bench press shows a maximum weight of 135 and a maximum repetition of 5, but two did not occur on the same line in your data.

Here's a slightly (more correct) approach using your which.max thought:

 do.call(rbind, by(data, data[c("Date", "Exercise")], function(x) x[which.max(x$Weight),]) ) # Date Exercise Category Weight Reps EstMax RepxWeight Note # 5 4/2/16 Bench Press Chest 135 2 143.9910 2x135 not hard # 6 4/9/16 Bench Press Chest 135 2 143.9910 2x135 a little hard # 12 5/8/16 Bench Press Chest 115 4 130.3180 4x115 easy # 8 4/18/16 Deadlift Legs 155 8 196.2920 8x155 ... # 1 4/2/16 Deadlift Legs 135 7 166.4685 7x135 easy # 10 5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day 

If for some reason it is possible to have one Exercise in more than one Category , you may need the second by argument to be data[c("Date","Exercise","Category")] .

(You can order the output with something like x[order(as.Date(x$Date, format="%m/%d/%Y")),] ... In fact, you probably meant that the $Date column is the actual Date class.)

+2
source

I know that you prefer basic R solutions, but dplyr offers a function 'top_n' that does exactly what you ask.

Use it once to get all instances of EstMax:

 library(dplyr) data %>% group_by(Exercise) %>% top_n(1, EstMax) # A tibble: 5 x 8 # Groups: Exercise [2] Date Exercise Category Weight Reps EstMax RepxWeight Note <fctr> <fctr> <fctr> <int> <int> <dbl> <fctr> <fctr> 1 4/2/16 Bench Press Chest 135 2 143.9910 2x135 not hard 2 4/9/16 Bench Press Chest 135 2 143.9910 2x135 a little hard 3 4/9/16 Bench Press Chest 135 2 143.9910 2x135 super tired 4 5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day 5 5/8/16 Deadlift Legs 185 3 203.4815 3x185 felt easy 

Use it twice to get the first result of maximum results:

 data %>% group_by(Exercise) %>% top_n(1, EstMax) %>% top_n(1, Date) Selecting by Note # A tibble: 2 x 8 # Groups: Exercise [2] Date Exercise Category Weight Reps EstMax RepxWeight Note <fctr> <fctr> <fctr> <int> <int> <dbl> <fctr> <fctr> 1 4/9/16 Bench Press Chest 135 2 143.9910 2x135 super tired 2 5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day 

Please note that this is the first result, not necessarily the earliest date. Thus, you must get settled by the date before using the second "top_n":

 data %>% group_by(Exercise) %>% top_n(1, EstMax) %>% mutate(Date = as.Date(Date, format = '%d/%m/%y')) %>% arrange(Date) %>% top_n(1) Selecting by Note # A tibble: 2 x 8 # Groups: Exercise [2] Date Exercise Category Weight Reps EstMax RepxWeight Note <date> <fctr> <fctr> <int> <int> <dbl> <fctr> <fctr> 1 2016-09-04 Bench Press Chest 135 2 143.9910 2x135 super tired 2 2016-08-05 Deadlift Legs 185 3 203.4815 3x185 good day 

[edit] Read the question incorrectly, here is a solution that provides the output you are requesting:

 data %>% group_by(Date, Exercise) %>% top_n(1, EstMax) %>% top_n(1) Selecting by Note # A tibble: 6 x 8 # Groups: Date, Exercise [6] Date Exercise Category Weight Reps EstMax RepxWeight Note <fctr> <fctr> <fctr> <int> <int> <dbl> <fctr> <fctr> 1 4/2/16 Deadlift Legs 135 7 166.4685 7x135 tired 2 4/2/16 Bench Press Chest 135 2 143.9910 2x135 not hard 3 4/9/16 Bench Press Chest 135 2 143.9910 2x135 super tired 4 4/18/16 Deadlift Legs 155 8 196.2920 8x155 … 5 5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day 6 5/8/16 Bench Press Chest 115 4 130.3180 4x115 hard 
+1
source

All Articles