Group by ID and filter only the group that has the maximum average value

Question

Group by ID and filter only the group that has the maximum average value

I have a DF as follows:

a <- data.frame(group =c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5), count = c(12L, 80L, 102L, 97L, 118L, 115L, 4L, 13L, 136L,114L, 134L, 126L, 128L, 63L, 118L, 1L, 28L, 18L, 18L, 23L))

   group count
1      1    12
2      1    80
3      1   102
4      1    97
5      2   118
6      2   115
7      2     4
8      2    13
9      3   136
10     3   114
11     3   134
12     3   126
13     4   128
14     4    63
15     4   118
16     4     1
17     5    28
18     5    18
19     5    18
20     5    23

I used the following command:

a %>% group_by(group) %>% summarise(mean(count))

  group mean(count)
  (dbl)       (dbl)
1     1       72.75
2     2       62.50
3     3      127.50
4     4       77.50
5     5       21.75

I want to filter out group entries that are related to the highest average. let's say the third group contains the maximum value, so my conclusion should be

   group count
1     3   136
2     3   114
3     3   134
4     3   126

Can anyone think how to do this?

+4

r dataframe dplyr

haimen Jun 08 '16 at 18:55

source share

4 answers

mutate summarize, data.frame.

new_data <- a %>% group_by(group) %>% 
  ##compute average count within groups
  mutate(AvgCt = mean(count)) %>% 
  ungroup() %>% 
  ##filter, looking for the maximum of the created variable
  filter(AvgCt == max(AvgCt))

> new_data
Source: local data frame [4 x 3]

  group count AvgCt
  (dbl) (int) (dbl)
1     3   136 127.5
2     3   114 127.5
3     3   134 127.5
4     3   126 127.5

,

new_data <- new_data %>% select(-AvgCt)

> new_data
Source: local data frame [4 x 2]

  group count
  (dbl) (int)
1     3   136
2     3   114
3     3   134
4     3   126

+4

BarkleyBG 08 . '16 19:03

Maybe some xtabs/ tabulatealso for some fun (if groups- it's not just numbers, it will require adding namesto the call which.max)

a[a$group == which.max(xtabs(count ~ group, a) / tabulate(a$group)),]
#    group count
# 9      3   136
# 10     3   114
# 11     3   134
# 12     3   126

Or combined with rowsum

a[a$group == which.max(rowsum.default(a$count, a$group) / tabulate(a$group)), ]
#    group count
# 9      3   136
# 10     3   114
# 11     3   134
# 12     3   126

+4

David Arenburg Jun 08 '16 at 19:34

source share

Usage dplyr:

a %>% group_by(group) %>% 
    mutate(mc = mean(count)) %>% ungroup() %>% 
    filter(mc == max(mc)) %>% select(-mc)

Source: local data frame [4 x 2]

  group count
  (dbl) (int)
1     3   136
2     3   114
3     3   134
4     3   126

Another option with data.table:

a[a[, .(mc = mean(count)), .(group)][mc == max(mc), -"mc", with=F], on = "group"]
   group count
1:     3   136
2:     3   114
3:     3   134
4:     3   126

+2

Psidom Jun 08 '16 at 19:02

source share

lmo · Accepted Answer · 2016-06-08T19:08:56+0000

If you want to see the basic solution of R, you can do this with which.maxand aggregate:

# calculate means by group
myMeans <- aggregate(count~group, a, FUN=mean)

# select the group with the max mean
maxMeanGroup <- a[a$group == myMeans[which.max(myMeans$count),]$group, ]

As a second method, you can try data.table:

library(data.table)
setDT(a)

a[group == a[, list("count"=mean(count)), by=group
             ][, which.max(count)], ]

which returns

   group count
1:     3   136
2:     3   114
3:     3   134
4:     3   126

Group by ID and filter only the group that has the maximum average value

More articles: