Symmetric, violin plot histogram?

How can I make a histogram in which the center of each bar lies along a common axis? It will look like a violin with stepping edges.

I would like to do this in Lattice and do not mind setting up the functions of the panel, etc., but would be glad to use the basic graphics of R or even ggplot2. (I haven't hit ggplot2 yet, but at some point plunge.)

(Why do I want to do this? I think this can be a useful substitute for the violin plot when the data is discrete and comes with several [5-50] evenly spaced numerical values. Each bit then represents a point Of course, I could just create a normal but I think it's sometimes useful to display both a square-and-mustache graph and a violin plot. With discrete data at regular intervals, a symmetrical histogram with the same orientation as boxplot allows you to compare the detailed data structure with like a violin script, in which case a symmetrical histogram may be more informative than a violin plot. (Bobzon may be another alternative to what I just described, although in fact my data is not literally discrete - they just converge to almost a number of regular values. This makes the R beanplot package less useful for me if I do not normalize the values ​​by comparing them with the nearest regular value.))

Below is a subset of 30 observations of some data that is generated by agent-based simulation:

df30 <- data.frame(crime.v=c(0.2069526, 0.2063516, 0.06919754, 0.2080366, -0.06975912, 0.206277, 0.3457634, 0.2058985, 0.3428499, 0.3428159, 0.06746109, -0.07068694, 0.4826098, -0.06910966, 0.06769761, 0.2098732, 0.3482267, 0.3483602, 0.4829777, 0.06844112, 0.2093492, 0.4845478, 0.2093505, 0.3482845, 0.3459249, 0.2106339, 0.2098397, 0.4844956, 0.2108985, 0.2107984), bias=c("beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus")) 

An information frame named df with a full set of 600 observations in the Rdata file can be downloaded at this link: CVexample.rdata .

The crime.v values ​​are next to one of the following, which I will call magic tricks:

 [1] -0.89115386 -0.75346155 -0.61576924 -0.47807693 -0.34038463 -0.20269232 -0.06500001 [8] 0.07269230 0.21038460 0.34807691 0.48576922 0.62346153 0.76115383 0.89884614 

(The crime.v values ​​are actually the average of 13 variables, the values ​​of which can vary from -1 to 1, but in the end they converge to values ​​that are in the vicinity of 0.9 or -, 9. The average values ​​of 13 are about 0 , 9 or -9 are somewhat close to the tricks. In practice, I determined the appropriate values ​​for the tricks by examining the data, as there were some additional variations.)

Screening a violin can be created using:

 require(lattice) bwplot(crime.v ~ bias, data=df30, ylim=c(-1,1), panel=panel.violin) 

If you run this with a larger dataset, you will see that one of the created script graphs is multimodal and the other does not. However, this does not reflect the difference in the data underlying the two violin plots; As far as I can tell, this is an artifact due to the location of the tricks in relation to the plot. I can smooth the difference by changing the density parameters passed to panel.violin, but it would be easier to specify how many points in each cluster.

Thanks!

+7
source share
2 answers

Here is one way to use basic graphics:

 tmp <- tapply( iris$Petal.Length, iris$Species, function(x) hist(x, plot=FALSE) ) plot.new() tmp.r <- do.call( range, lapply(tmp, `[[`, 'breaks') ) plot.window(xlim=c(1/2,length(tmp)+1/2), ylim=tmp.r) abline(v=seq_along(tmp)) for( i in seq_along(tmp) ) { h <- tmp[[i]] rf <- h$counts/sum(h$counts) rect( i-rf/2, head(h$breaks, -1), i+rf/2, tail(h$breaks, -1) ) } axis(1, at=seq_along(tmp), labels=names(tmp)) axis(2) box() 

You can customize the various parts to your preference, and all this can be easily transferred to the function.

+7
source

Here is a grid panel function based on @GregSnow's answer using basic graphics. I could not have done this without Greg providing a solid starting point, so all loans go to Greg. My panel function is not very complex and it can break very well into something simple, but it will handle horizontal and vertical orientations and allow you to supply a vector of breaks or leave it. It also removes boxes at the ends that are empty. The panel function uses hist default behavior for breaks , not histogram , which is more complicated. Comments on best practices are welcome.

Since the symmetrical or centered histograms do not have a name, as far as I know, and they resemble the Tower of Hanoi toy, perhaps they should be called the Hanoi Tower histograms. So the function is called panel.hanoi .

A simple use case using the definition of df30 above:

 bwplot(crime.v ~ bias, data=df30, panel=panel.hanoi) 

Here is a more complex example using the data provided in the link in the question (graphic at the end of the answer).

 bwplot(crime.v ~ bias, data=df, ylim=c(-1,1), pch="|", coef=0, panel=function(...){panel.hanoi(col="pink", breaks=cv.ints, ...); panel.bwplot(...)}) 

In this example, ylim added to indicate that the graph should go from -1 to 1 and impose bwplot on top of the Hanoi graph. pch and coef affect the appearance of bwplot. This example also uses the following definition to center each cell of the Hanoi graph around places where my data points tend to lie (see Original Question):

 cv.ints <- c(-1.000000000, -0.960000012, -0.822307704, -0.684615396, -0.546923088, -0.409230781, -0.271538473, -0.133846165, 0.003846142, 0.141538450, 0.279230758, 0.416923065, 0.554615373, 0.692307681, 0.829999988, 0.967692296, 1.000000000) 

Here is the panel:

 panel.hanoi <- function(x, y, horizontal, breaks="Sturges", ...) { # "Sturges" is hist() default if (horizontal) { condvar <- y # conditioning ("independent") variable datavar <- x # data ("dependent") variable } else { condvar <- x datavar <- y } conds <- sort(unique(condvar)) # loop through the possible values of the conditioning variable for (i in seq_along(conds)) { h <- hist(datavar[condvar == conds[i]], plot=F, breaks) # use base hist(ogram) function to extract some information # strip outer counts == 0, and corresponding bins brks.cnts <- stripOuterZeros(h$breaks, h$counts) brks <- brks.cnts[[1]] cnts <- brks.cnts[[2]] halfrelfs <- (cnts/sum(cnts))/2 # ie half of the relative frequency center <- i # All of the variables passed to panel.rec will usually be vectors, and panel.rect will therefore make multiple rectangles. if (horizontal) { panel.rect(head(brks, -1), center - halfrelfs, tail(brks, -1), center + halfrelfs, ...) } else { panel.rect(center - halfrelfs, head(brks, -1), center + halfrelfs, tail(brks, -1), ...) } } } # function to strip counts that are all zero on ends of data, along with the corresponding breaks stripOuterZeros <- function(brks, cnts) { do.call("stripLeftZeros", stripRightZeros(brks, cnts)) } stripLeftZeros <- function(brks, cnts) { if (cnts[1] == 0) { stripLeftZeros(brks[-1], cnts[-1]) } else { list(brks, cnts) } } stripRightZeros <- function(brks, cnts) { len <- length(cnts) if (cnts[len] ==0) { stripRightZeros(brks[-(len+1)], cnts[-len]) } else { list(brks, cnts) } } 

Tower of Hanoi histograms with overlaid bwplots

+5
source

All Articles