Overlap density graph excludes histogram values

I want to plot the density curve on the histogram of the frequency that I built. For the frequency histogram, I used aes(y=..counts../40) , because 40 is my common sample number. I used aes(y=..density..*0.1) to make the density be somewhere between 0 and 1, since my bin width is 0.1. However, the density curve does not match my data and excludes values โ€‹โ€‹equal to 1.0 (note that the histogram shows the accumulation values โ€‹โ€‹for bin = (1.0.1.1), but the density curve ends at 1.0)

this is my data

 data<-structure(list(variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("E1", "test" ), class = "factor"), value = c(0.288888888888889, 0.0817901234567901, 0.219026548672566, 0.584795321637427, 0.927554980595084, 0.44661095636026, 1, 0.653780942692438, 1, 0.806451612903226, 1, 0.276794335371741, 1, 0.930109557990178, 0.776864728192162, 0.824909747292419, 1, 1, 1, 1, 1, 0.0875912408759124, 0.308065494238933, 1, 0.0258064516129032, 0.0167322834645669, 1, 1, 0.355605889014723, 0.310344827586207, 0.106598984771574, 0.364447494852436, 0.174724342663274, 0.77491961414791, 1, 0.856026785714286, 0.680759275237274, 0.850657108721625, 1, 1, 0, 0.851851851851852, 1, 0, 0.294954721862872, 0.819870009285051, 0, 0.734147168531706, 0.0135424091233072, 0.0189098998887653, 0.0101010101010101, 0, 0.296905222437137, 0.706837929731772, 0.269279393173198, 0.135379061371841, 0.158969804618117, 0.0902981940361193, 0.00423131170662906, 0, 0.374880611270296, 0.0425790754257908, 0.145542753183748, 0, 0.129032258064516, 0.260334645669291, 0, 0, 1, 0.175505350772889, 0.08248730964467, 0, 0.317217981340119, 0.614147909967846, 0, 0.264508928571429, 0.883520276100086, 0.0657108721624851, 0, 0.560229445506692)), row.names = c(NA, -80L), .Names = c("variable", "value"), class = "data.frame") 

Plot

 q<-ggplot(data, aes(value, fill = variable)) q + geom_density(alpha = 0.6,aes(y=..density..*0.1),binwidth=0.1) + theme_minimal()+scale_fill_manual(values =c("#D7191C","#2B83BA")) + theme(legend.position="bottom")+ guides(fill=guide_legend(nrow=1)) + labs(title="Density Plot GrupoB",x="Respuesta",y="Density") +scale_x_continuous(breaks=seq(from=0,to=1.2,by=0.1)) +geom_histogram(alpha = 0.6,aes(y=..count../40),binwidth=0.1,position="dodge") 

The output I get is enter image description here

+5
source share
1 answer

Your plot does exactly what you would expect from your data:

  • You are drawing a data$value that contains numerical values โ€‹โ€‹from 0 to 1, so you should expect the density curve to also work from 0 to 1.
  • You are plotting a histogram with a bandwidth of 0.1. The silos are closed at the lower and open at the upper end. Thus, the bitting that you get in your case is [0,0.1), [0,1, 0,2), ..., [0,9,1,0), [1,0,1, 1). You have 17 values โ€‹โ€‹in your data that are equal to 1 and thus fall into the last bit, which is displayed from 1 to 1.1.

I think itโ€™s a bad idea to build a histogram the way you do. The reason is that for the histogram, the x axis is continuous, which means that a bar that spans a range on the x axis from, say, 0.1 to 0.2, means the number of values โ€‹โ€‹between (and including) 0.1 and 0, 2 (not including the last). Using dodge in this situation leads to image distortion, since the stripes no longer cover the correct range on the x axis. The two bands share a range that should be fully covered by both of them. This distortion is one of the reasons why the density curve does not seem to fit the histogram.

So what can you do about it? I can give you some suggestions, but maybe others have ideas ...

  • Instead of plotting the histograms next to each other using position="dodge" , you can use faceting, that is, draw histograms (and the corresponding density curves) on separate graphs. This can be achieved by adding + facet_grid(variable~.) To your plot.

  • You can trick a bit to have the last bit that is [0,9,1], including 1 (ie be [0,9,1,0]). Just replace 1 in your data with 0.999 as follows: data$value[data$value==1]<-0.999 . It is important that you do this only for the plot, where it really means that you are slightly redefining bitting. For all the numerical ratings you make, you should not make this replacement! (This, for example, will change the average value of data$value .)

  • Regarding the normalization of the density curve and the histogram: there is no need for the density curve to lie between 0 and 1. The restriction is that the integral over the density curve must be equal to 1. Thus, to make the density curve and the histogram comparable, the histogram should also have the integral 1, which is achieved by dividing the y value by the binding width. So you should use geom_density(alpha = 0.6,aes(y=..density..)) (I also deleted bindwith=0.1 because it does not affect geom_density ) and geom_histogram(alpha = 0.6,aes(y=..count../40/.1),binwidth=0.1) (no need for position="dodge" as soon as you use the cut). This, of course, leads to exactly the relative normalization that you had, but it makes sense because the integrals along the density curve and the histogram are 1, as it should be.

  • The density curve is still not perfectly consistent with the histogram, and this is due to how the density estimate is calculated. I do not know this in detail and therefore, unfortunately, I can not explain it further. But you can better understand how this works by playing with the adjust parameter to geom_density . This will make the curve less smooth for smaller numbers, and the curve will look more like a histogram.

To put everything together, I created all my sentences in my code, used adjust=0.2 in geom_density and built the result:

 data$value[data$value==1]<-0.999 q<-ggplot(data, aes(value, fill = variable)) q + geom_density(alpha = 0.6,aes(y=..density..),adjust=0.2) + theme_minimal()+scale_fill_manual(values =c("#D7191C","#2B83BA")) + theme(legend.position="bottom")+ guides(fill=guide_legend(nrow=1)) + labs(title="Density Plot GrupoB",x="Respuesta",y="Density")+ scale_x_continuous(breaks=seq(from=0,to=1.2,by=0.1))+ geom_histogram(alpha = 0.6,aes(y=..count../40/.1),binwidth=0.1) + facet_grid(variable~.) 

enter image description here

Unfortunately, I cannot give you a more complete answer, but I hope that these ideas will give you a good start.

+6
source

All Articles