When using ggplot2, can I set the color of the histograms without potentially shading low values?

When geom_histogram() called with the color and fill arguments, ggplot2 will confuse the entire range along the x axis, which makes it impossible to visually distinguish between a low value and a zero value.

Running the following code:

 ggplot(esubset, aes(x=exectime)) + geom_histogram(binwidth = 0.5) + theme_bw() + scale_x_continuous(breaks=seq(0,20), limits=c(0,20)) 

will result in

a histogram w / o color attributes

It is visually very unattractive. To fix this, I would like to use instead

 ggplot(esubset, aes(x=exectime)) + geom_histogram(binwidth = 0.5, colour='black', fill='gray') + theme_bw() + scale_x_continuous(breaks=seq(0,20), limits=c(0,20)) 

leading to

a histogram with color attributes

The problem is that I will not distinguish whether the exectime contains values ​​from the past 10, since some occurrences of 12, for example, will be hidden behind a horizontal line that spans the entire x-axis.

+5
source share
1 answer

Use coord_cartesian instead of scale_x_continuous . coord_cartesian sets the range of axes without affecting how the data is displayed. Even with coord_cartesian you can still use scale_x_continuous to set breaks , but coord_cartesian override any scale_x_continuous action on how the data is displayed.

In the fake data below, note that I have added data for several very small bars.

 set.seed(4958) dat = data.frame(value=c(rnorm(5000, 10, 1), rep(15:20,1:6))) ggplot(dat, aes(value)) + geom_histogram(binwidth=0.5, color="black", fill="grey") + theme_bw() + scale_x_continuous(limits=c(5,25), breaks=5:25) + ggtitle("scale_x_continuous") ggplot(dat, aes(value)) + geom_histogram(binwidth=0.5, color="black", fill="grey") + theme_bw() + coord_cartesian(xlim=c(5,25)) + scale_x_continuous(breaks=5:25) + ggtitle("coord_cartesian") 

enter image description here

As you can see in the above graphs, if there are cells with count = 0 in the data range, ggplot will add a zero line, even with coord_cartesian . This makes it difficult to view the strip at a height of 15 = 1. You can make the border thinner with the argument lwd ("line width") so that smaller stripes are less obscured:

 ggplot(dat, aes(value)) + geom_histogram(binwidth=0.5, color="black", fill="grey", lwd=0.3) + theme_bw() + coord_cartesian(xlim=c(5,25)) + scale_x_continuous(breaks=5:25) + ggtitle("coord_cartesian") 

enter image description here

Another option is to pre-sum the data and the graph using geom_bar to get spaces between the columns and thus avoid the need for border lines to mark the borders of the bar:

 library(dplyr) library(tidyr) library(zoo) bins = seq(floor(min(dat$value)) - 1.75, ceiling(max(dat$value)) + 1.25, 0.5) dat.binned = dat %>% count(bin=cut(value, bins, right=FALSE)) %>% # Bin the data complete(bin, fill=list(n=0)) %>% # Restore empty bins and fill with zeros mutate(bin = rollmean(bins,2)[-length(bins)]) # Convert bin from factor to numeric with value = mean of bin range ggplot(dat.binned, aes(bin, n)) + geom_bar(stat="identity", fill=hcl(240,100,30)) + theme_bw() + scale_x_continuous(breaks=0:21) 

enter image description here

+5
source

All Articles