Although only a few lines are required to build multiple / overlapping histograms in ggplot2, the results are not always satisfactory. Borders and colors must be used correctly so that the eye can distinguish between histograms .
The following features balance border colors, opacity, and overlay density plots to allow the viewer to distinguish between distributions.
Single Bar Graph :
plot_histogram <- function(df, feature) { plt <- ggplot(df, aes(x=eval(parse(text=feature)))) + geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") + geom_density(alpha=0.3, fill="red") + geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) + labs(x=feature, y = "Density") print(plt) }
Multiple Bar Chart :
plot_multi_histogram <- function(df, feature, label_column) { plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) + geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") + geom_density(alpha=0.7) + geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) + labs(x=feature, y = "Density") plt + guides(fill=guide_legend(title=label_column)) }
Usage :
Just pass your data frame to the above functions along with the desired arguments:
plot_histogram(iris, 'Sepal.Width')

plot_multi_histogram(iris, 'Sepal.Width', 'Species')

An additional parameter in plot_multi_histogram is the name of the column containing category labels.
We can see this more dramatically by creating a data frame with various means of distribution :
a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000)) b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000)) c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000)) d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000)) e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000)) f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000)) many_distros <- do.call('rbind', list(a,b,c,d,e,f))
Transferring the data frame as before (and expanding the chart using options):
options(repr.plot.width = 20, repr.plot.height = 8) plot_multi_histogram(many_distros, 'n', 'category')
