Highlighting box shortcuts in R

I have a code that creates boxplot using ggplot in R, I want to mark my outliers by year and battle.

Here is my code to create my boxplot

require(ggplot2) ggplot(seabattle, aes(x=PortugesOutcome,y=RatioPort2Dutch ),xlim="OutCome", y="Ratio of Portuguese to Dutch/British ships") + geom_boxplot(outlier.size=2,outlier.colour="green") + stat_summary(fun.y="mean", geom = "point", shape=23, size =3, fill="pink") + ggtitle("Portugese Sea Battles") 

Can anyone help? I knew this was right, I just want to designate emissions.

+15
r ggplot2 boxplot direct-labels
source share
6 answers

The following is a reproducible solution that uses dplyr and the dplyr built-in mtcars .

Walking through the code: first create an is_outlier function that will return boolean TRUE/FALSE if the value passed to it is a throw. Then we do the “analysis / verification” and build the data - first we group_by our variable ( cyl in this example, in this example, it will be PortugesOutcome ), and we will add the outlier variable to the mutate call (if the drat variable is outlier [note that this corresponds to RatioPort2Dutch in your example], we will pass the drat value, otherwise we will return NA so that the value does not have a graph). Finally, we construct the results and construct the text values ​​through geom_text and an aesthetic label equal to our new variable; in addition, we compensate for the text (move it to the right) with hjust so that we can see the values ​​next to the outling points, and not on top of them.

 library(dplyr) library(ggplot2) is_outlier <- function(x) { return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x)) } mtcars %>% group_by(cyl) %>% mutate(outlier = ifelse(is_outlier(drat), drat, as.numeric(NA))) %>% ggplot(., aes(x = factor(cyl), y = drat)) + geom_boxplot() + geom_text(aes(label = outlier), na.rm = TRUE, hjust = -0.3) 

Boxplot

+25
source share

Does this work for you?

 library(ggplot2) library(data.table) #generate some data set.seed(123) n=500 dat <- data.table(group=c("A","B"),value=rnorm(n)) 

ggplot defines outlier by default as something that> 1.5 * IQR from the field boundaries.

 #function that takes in vector of data and a coefficient, #returns boolean vector if a certain point is an outlier or not check_outlier <- function(v, coef=1.5){ quantiles <- quantile(v,probs=c(0.25,0.75)) IQR <- quantiles[2]-quantiles[1] res <- v < (quantiles[1]-coef*IQR)|v > (quantiles[2]+coef*IQR) return(res) } #apply this to our data dat[,outlier:=check_outlier(value),by=group] dat[,label:=ifelse(outlier,"label","")] #plot ggplot(dat,aes(x=group,y=value))+geom_boxplot()+geom_text(aes(label=label),hjust=-0.3) 

enter image description here

+7
source share

To indicate outliers by row names (based on JasonAizkalns answer)

 library(dplyr) library(ggplot2) library(tibble) is_outlier <- function(x) { return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x)) } dat <- mtcars %>% tibble::rownames_to_column(var="outlier") %>% group_by(cyl) %>% mutate(is_outlier=ifelse(is_outlier(drat), drat, as.numeric(NA))) dat$outlier[which(is.na(dat$is_outlier))] <- as.numeric(NA) ggplot(dat, aes(y=drat, x=factor(cyl))) + geom_boxplot() + geom_text(aes(label=outlier),na.rm=TRUE,nudge_y=0.05) 

boxplot with outliers name

+6
source share

A similar answer is above, but receives outliers directly from ggplot2 , thus avoiding any potential conflict in the method:

 # calculate boxplot object g <- ggplot(mtcars, aes(factor(cyl), drat)) + geom_boxplot() # get list of outliers out <- ggplot_build(g)[["data"]][[1]][["outliers"]] # label list elements with factor levels names(out) <- levels(factor(mtcars$cyl)) # convert to tidy data tidyout <- purrr::map_df(out, tibble::as_tibble, .id = "cyl") # plot boxplots with labels g + geom_text(data = tidyout, aes(cyl, value, label = value), hjust = -.3) 

enter image description here

+3
source share

You can do this simply inside ggplot itself using the appropriate stat_summary call.

 ggplot(mtcars, aes(x = factor(cyl), y = drat, fill = factor(cyl))) + geom_boxplot() + stat_summary( aes(label = round(stat(y), 1)), geom = "text", fun.y = function(y) { o <- boxplot.stats(y)$out; if(length(o) == 0) NA else o }, hjust = -1 ) 

enter image description here

0
source share

With a little twist in the @JasonAizkalns solution, you can tag outliers with your location in your data frame.

 mtcars[,'row'] <- row(mtcars)[,1] ... mutate(outlier = ifelse(is_outlier(drat), row, as.numeric(NA))) ... 

I load the data frame into R Studio, so I can take a closer look at the data in the outlier lines.

-2
source share

All Articles