How can I make a histogram in which the center of each bar lies along a common axis? It will look like a violin with stepping edges.
I would like to do this in Lattice and do not mind setting up the functions of the panel, etc., but would be glad to use the basic graphics of R or even ggplot2. (I haven't hit ggplot2 yet, but at some point plunge.)
(Why do I want to do this? I think this can be a useful substitute for the violin plot when the data is discrete and comes with several [5-50] evenly spaced numerical values. Each bit then represents a point Of course, I could just create a normal but I think it's sometimes useful to display both a square-and-mustache graph and a violin plot. With discrete data at regular intervals, a symmetrical histogram with the same orientation as boxplot allows you to compare the detailed data structure with like a violin script, in which case a symmetrical histogram may be more informative than a violin plot. (Bobzon may be another alternative to what I just described, although in fact my data is not literally discrete - they just converge to almost a number of regular values. This makes the R beanplot package less useful for me if I do not normalize the values ββby comparing them with the nearest regular value.))
Below is a subset of 30 observations of some data that is generated by agent-based simulation:
df30 <- data.frame(crime.v=c(0.2069526, 0.2063516, 0.06919754, 0.2080366, -0.06975912, 0.206277, 0.3457634, 0.2058985, 0.3428499, 0.3428159, 0.06746109, -0.07068694, 0.4826098, -0.06910966, 0.06769761, 0.2098732, 0.3482267, 0.3483602, 0.4829777, 0.06844112, 0.2093492, 0.4845478, 0.2093505, 0.3482845, 0.3459249, 0.2106339, 0.2098397, 0.4844956, 0.2108985, 0.2107984), bias=c("beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus"))
An information frame named df with a full set of 600 observations in the Rdata file can be downloaded at this link: CVexample.rdata .
The crime.v values ββare next to one of the following, which I will call magic tricks:
[1] -0.89115386 -0.75346155 -0.61576924 -0.47807693 -0.34038463 -0.20269232 -0.06500001 [8] 0.07269230 0.21038460 0.34807691 0.48576922 0.62346153 0.76115383 0.89884614
(The crime.v values ββare actually the average of 13 variables, the values ββof which can vary from -1 to 1, but in the end they converge to values ββthat are in the vicinity of 0.9 or -, 9. The average values ββof 13 are about 0 , 9 or -9 are somewhat close to the tricks. In practice, I determined the appropriate values ββfor the tricks by examining the data, as there were some additional variations.)
Screening a violin can be created using:
require(lattice) bwplot(crime.v ~ bias, data=df30, ylim=c(-1,1), panel=panel.violin)
If you run this with a larger dataset, you will see that one of the created script graphs is multimodal and the other does not. However, this does not reflect the difference in the data underlying the two violin plots; As far as I can tell, this is an artifact due to the location of the tricks in relation to the plot. I can smooth the difference by changing the density parameters passed to panel.violin, but it would be easier to specify how many points in each cluster.
Thanks!