Proper use of scale_fill_manual () to create colorful histograms in ggplot2?

I have a number of data files that Id like to explore in R, each of which I plan to use to create a data frame with a column variable, which for this question is marked with the Ill icon as foo . The range foo lies along the interval [0, 7000]. As part of my data exploration research, Id wanted to create a 1D histogram foo , but with a slight twist: foo values ​​in the range between (1000, 7000) are especially interesting to me, and therefore Id would like to color individual sheets of the histogram in this data range using the color picker (i.e. because later in the end I intend to reuse the same palette to display data from some other columns that I temporarily skipped from the data frame so as not to leave my question unnecessarily complicated). Conversely, the values ​​of foo in the range [0,1000] are not so interesting to me, however Id still likes to see them on the histogram, although it is colored gray, in cases where any values ​​are present.

In my code example below, I created an artificial data sample and tried to plot the histogram with ggplot2 , choosing fill colors with scale_fill_manual() . I got a multi-color histogram, however it does not look as expected: ggplot2 seems to have ignored my instructions on where to place the gaps between the colors. In particular, the problem seems to be related to missing data: intervals that have no data do not seem to display in color, although that was my intention, which they should be. It also means that the gray color ends up being displayed on the interval (1000, 1500) instead of [0, 1000], as I expected.

My question is: how can I get ggplot2 to assign certain color fill codes for certain data ranges, even if some intervals are empty and have no data, and histograms corresponding to these intervals are therefore not generated?

Ive included the original version of my code below, along with a dummy example data framework plus a manual annotated version of the output it produces.

 library(ggplot2) # Minimum and maximum values of interest (for other data sets, additional # values that are of lesser interest may fall within the interval [0, 1000]) lolim<-1000 hilim<-7000 bwdth<-500 # Construct sample data frame df<-data.frame(foo=c(1200, 1300, 1750, 2200, 2300, 2750, 3200, 3300, 3750, 4200, 4300, 4750, 6200, 6300, 6750)) # Construct a discrete factor variable which can later be mapped onto # discrete color codes df$colcode<-cut(df$foo, breaks=c(0, seq(lolim, hilim, bwdth)), include.lowest=TRUE) # Create the breaks and color codes to be used by scale_fill_manual() brk<-levels(df$colcode) ncol<-length(brk) # My expectation is that "#808080FF" (gray) will map onto the range # [0, 1000], while a palette consisting of 12 sequential shades of the # rainbow will be mapped onto the range (1000, 7000], in intervals of 500 colors<-c("#808080FF", rainbow(ncol-1)) # Draw the histogram print(ggplot(df, aes(foo)) + geom_histogram(aes(fill=colcode), binwidth=bwdth) + scale_fill_manual("", breaks=brk, values=colors)) 

Hand-annotated sample output

+8
r ggplot2
source share
1 answer

You can set the drop argument to FALSE . See ?discrete_scale : drop unused factor levels from the scale (TRUE or FALSE)

 ggplot(df, aes(foo)) + geom_histogram(aes(fill = colcode), binwidth = bwdth) + scale_fill_manual("", breaks = brk, values = colors, drop = FALSE) 

enter image description here

+9
source share

All Articles