Ddply several quantiles by group

how can i do this calculation:

library(ddply) quantile(baseball$ab) 0% 25% 50% 75% 100% 0 25 131 435 705 

by groups, say, "team"? I want data.frame with the names "team" and the column names "0% 25% 50% 75% 100%", i.e. One quantile call for each group.

make

 ddply(baseball,"team",quantile(ab)) 

not the right decision. my problem is that the OUTPUT of each grouped operation is a vector of length 5 here.

In other words, what a neat solution for this (don't forget the title):

 m=data.frame() for (i in unique(baseball$team)){m=rbind(m,quantile(baseball[baseball$team==i, ]$ab))} head(m,3) X120 X120.1 X120.2 X120.3 X120.4 1 120 120.0 120.0 120.00 120 2 162 162.0 162.0 162.00 162 3 89 89.0 89.0 89.00 89 
+7
source share
4 answers

With base R you can use tapply and do.call

 library(plyr) do.call("rbind", tapply(baseball$ab, baseball$team, quantile)) do.call("rbind", tapply(baseball$ab, baseball$team, quantile, c(0.05, 0.1, 0.2))) 

Or, with ddply

 ddply(baseball, .(team), function(x) quantile(x$ab)) 
+16
source

You must define the calculation for each quantile separately and use summarise . Also use .(team) .

 library(plyr) data(baseball) ddply(baseball,.(team),summarise, X0 = quantile(ab, probs = 0), X25 = quantile(ab, probs = 0.25), X50 = quantile(ab, probs = 0.50), X75 = quantile(ab, probs = 0.75), X100 = quantile(ab, probs = 1)) 
+3
source

A slightly different approach using dplyr :

 library(tidyverse) baseball %>% group_by(team) %>% nest() %>% mutate( ret = map(data, ~quantile(.$ab, probs = c(0.25, 0.75))), ret = invoke_map(tibble, ret) ) %>% unnest(ret) 

Here you can specify the required quantiles in the probs argument.

invoke_map seems necessary because quantile does not return a data frame; see this answer .

You can also put all this into a function:

 get_quantiles <- function(.data, .var, .probs = c(0.25, 0.75), .group_vars = vars()) { .var = deparse(substitute(.var)) return( .data %>% group_by_at(.group_vars) %>% nest() %>% mutate( ret = map(data, ~quantile(.[[.var]], probs = .probs)), ret = invoke_map(tibble, ret) ) %>% unnest(ret, .drop = TRUE) ) } mtcars %>% get_quantiles(wt, .group_vars = vars(cyl)) 

A new approach is to use group_modify() from dplyr . Then you would call:

 baseball %>% group_by(team) %>% group_modify(~{ quantile(.x$ab, probs = c(0.25, 0.75)) %>% tibble::enframe() }) %>% spread(name, value) 
+3
source

You can do this with non-standard quantiles, in dplyr :

 library(plyr) data(baseball) library(dplyr) prob=c(0.2, 0.8) summarise(group_by(baseball,team), p1 = quantile(ab, probs = prob[1]), p2 = quantile(ab, probs = prob[2])) 

NB is dplyr::summarise , not plyr::summarise

+2
source

All Articles