The most common value (mode) by group

I am trying to find the most frequent value by group. In the following dataframe example:

df<-data.frame(a=c(1,1,1,1,2,2,2,3,3),b=c(2,2,1,2,3,3,1,1,2)) > df ab 1 1 2 2 1 2 3 1 1 4 1 2 5 2 3 6 2 3 7 2 1 8 3 1 9 3 2 

I would like to add a column β€œc” that has the most counter meaning in β€œb” when its values ​​are grouped by β€œa”. I need the following output:

 > df abc 1 1 2 2 2 1 2 2 3 1 1 2 4 1 2 2 5 2 3 3 6 2 3 3 7 2 1 3 8 3 1 1 9 3 2 1 

I tried using the table and clicking, but did not understand. Is there a quick way to do this?
Thank you

+5
source share
3 answers

Based on David's comments, your solution is as follows:

 Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] } library(dplyr) df %>% group_by(a) %>% mutate(c=Mode(b)) 

Note that for binding, if df$a is 3 , then for mode b will be 1 .

+5
source

We can get the 'mode' of 'b', grouped by 'a', using ave

  Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] } df$c <- with(df, ave(b, a, FUN=Mode)) df$c #[1] 2 2 2 2 3 3 3 1 1 

Or using data.table

 library(data.table) setDT(df)[, c:= Mode(b), by=a][] 
+2
source

Here is the basic R method, which uses table to compute the cross tab, max.col to find the mode for each group, and rep along with rle to populate the mode by group.

 # calculate a cross tab, frequencies by group myTab <- table(df$a, df$b) # repeat the mode for each group, as calculated by colnames(myTab)[max.col(myTab)] # repeating by the number of times the group ID is observed df$c <- rep(colnames(myTab)[max.col(myTab)], rle(df$a)$length) df abc 1 1 2 2 2 1 2 2 3 1 1 2 4 1 2 2 5 2 3 3 6 2 3 3 7 2 1 3 8 3 1 2 9 3 2 2 

Note that this assumes that the data has been sorted into groups. In addition, the default max.col is a random link break (mulitple modes). If you want the first or last value to be a mode, you can set this using the .method binding argument.

0
source

Source: https://habr.com/ru/post/1216084/


All Articles