Histogram overlay with ggplot2 in R

I am new to R and trying to plot 3 histograms on one chart. Everything worked fine, but my problem is that you don’t see where the 2 histograms overlap - they look pretty cropped: Histogram

When I make density graphs, it looks perfect: each curve is surrounded by a black frame, and the colors look different when the curves overlap: Graph density

Can someone tell me if something like this can be achieved using the histograms in the 1st image? This is the code I'm using:

lowf0 <-read.csv (....) mediumf0 <-read.csv (....) highf0 <-read.csv(....) lowf0$utt<-'low f0' mediumf0$utt<-'medium f0' highf0$utt<-'high f0' histogram<-rbind(lowf0,mediumf0,highf0) ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2) 

Thanks in advance for the helpful tips!

+95
r ggplot2
Aug 05 2018-11-11T00:
source share
3 answers

Your current code:

 ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2) 

tells ggplot to build one histogram using all the values ​​in f0 , and then color the bars of this single histogram according to the utt variable.

Instead, you should create three separate histograms with alpha blending so that they are visible to each other. Therefore, you probably want to use three separate calls to geom_histogram , where each of them receives its own data frame and populates:

 ggplot(histogram, aes(f0)) + geom_histogram(data = lowf0, fill = "red", alpha = 0.2) + geom_histogram(data = mediumf0, fill = "blue", alpha = 0.2) + geom_histogram(data = highf0, fill = "green", alpha = 0.2) + 

Here is a concrete example with some way out:

 dat <- data.frame(xx = c(runif(100,20,50),runif(100,40,80),runif(100,0,30)),yy = rep(letters[1:3],each = 100)) ggplot(dat,aes(x=xx)) + geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) + geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) + geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2) 

which produces something like this:

enter image description here

Edited to correct typos; you need to fill, not color.

+99
Aug 05 2018-11-11T00:
source share

Using the @joran sample data,

 ggplot(dat, aes(x=xx, fill=yy)) + geom_histogram(alpha=0.2, position="identity") 

note that the default geom_histogram is the "stack".

see "Position Adjustment" on this page:

docs.ggplot2.org/current/geom_histogram.html

+185
Aug 05 2018-11-11T00:
source share

Although only a few lines are required to build multiple / overlapping histograms in ggplot2, the results are not always satisfactory. Borders and colors must be used correctly so that the eye can distinguish between histograms .

The following features balance border colors, opacity, and overlay density plots to allow the viewer to distinguish between distributions.

Single Bar Graph :

 plot_histogram <- function(df, feature) { plt <- ggplot(df, aes(x=eval(parse(text=feature)))) + geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") + geom_density(alpha=0.3, fill="red") + geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) + labs(x=feature, y = "Density") print(plt) } 

Multiple Bar Chart :

 plot_multi_histogram <- function(df, feature, label_column) { plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) + geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") + geom_density(alpha=0.7) + geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) + labs(x=feature, y = "Density") plt + guides(fill=guide_legend(title=label_column)) } 

Usage :

Just pass your data frame to the above functions along with the desired arguments:

 plot_histogram(iris, 'Sepal.Width') 

enter image description here

 plot_multi_histogram(iris, 'Sepal.Width', 'Species') 

enter image description here

An additional parameter in plot_multi_histogram is the name of the column containing category labels.

We can see this more dramatically by creating a data frame with various means of distribution :

 a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000)) b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000)) c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000)) d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000)) e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000)) f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000)) many_distros <- do.call('rbind', list(a,b,c,d,e,f)) 

Transferring the data frame as before (and expanding the chart using options):

 options(repr.plot.width = 20, repr.plot.height = 8) plot_multi_histogram(many_distros, 'n', 'category') 

enter image description here

0
Dec 08 '18 at 6:20
source share



All Articles