The most common value (mode) by group

Question

The most common value (mode) by group

I am trying to find the most frequent value by group. In the following dataframe example:

df<-data.frame(a=c(1,1,1,1,2,2,2,3,3),b=c(2,2,1,2,3,3,1,1,2)) > df ab 1 1 2 2 1 2 3 1 1 4 1 2 5 2 3 6 2 3 7 2 1 8 3 1 9 3 2

I would like to add a column “c” that has the most counter meaning in “b” when its values are grouped by “a”. I need the following output:

 > df abc 1 1 2 2 2 1 2 2 3 1 1 2 4 1 2 2 5 2 3 3 6 2 3 3 7 2 1 3 8 3 1 1 9 3 2 1

I tried using the table and clicking, but did not understand. Is there a quick way to do this?
Thank you

+5

r

Asif shakeel Mar 25 '15 at 12:18

source share

3 answers

We can get the 'mode' of 'b', grouped by 'a', using ave

  Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] } df$c <- with(df, ave(b, a, FUN=Mode)) df$c #[1] 2 2 2 2 3 3 3 1 1

Or using data.table

 library(data.table) setDT(df)[, c:= Mode(b), by=a][]

+2

akrun Mar 25 '15 at 12:21

source share

Here is the basic R method, which uses table to compute the cross tab, max.col to find the mode for each group, and rep along with rle to populate the mode by group.

 # calculate a cross tab, frequencies by group myTab <- table(df$a, df$b) # repeat the mode for each group, as calculated by colnames(myTab)[max.col(myTab)] # repeating by the number of times the group ID is observed df$c <- rep(colnames(myTab)[max.col(myTab)], rle(df$a)$length) df abc 1 1 2 2 2 1 2 2 3 1 1 2 4 1 2 2 5 2 3 3 6 2 3 3 7 2 1 3 8 3 1 2 9 3 2 2

Note that this assumes that the data has been sorted into groups. In addition, the default max.col is a random link break (mulitple modes). If you want the first or last value to be a mode, you can set this using the .method binding argument.

0

lmo Nov 25 '16 at 18:39

source share

dimitris_ps · Accepted Answer · 2015-03-25T12:46:57+0000

Based on David's comments, your solution is as follows:

 Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] } library(dplyr) df %>% group_by(a) %>% mutate(c=Mode(b))

Note that for binding, if df$a is 3 , then for mode b will be 1 .

The most common value (mode) by group

More articles: