In stat_summary_hex, why do hexagons overlap if z is a factor?

Question

In stat_summary_hex, why do hexagons overlap if z is a factor?

In the dataset below, item1 is numerical and item2 is a factor (but otherwise identical to substance1). For simplicity, the summary function is simply the maximum value in the bin. When z is a factor, the hexagons overlap. Does anyone know why?

library(ggplot2) library(hexbin) DF=data.frame(xpos=rnorm(1000), ypos=rnorm(1000), thing1=rep(1:9,length.out=100), thing2=as.factor(rep(1:9,length.out=100))) ggplot(DF, aes(x=xpos, y=ypos, z=thing1)) + stat_summary_hex(fun=function(x){x[which.max(x)]}) ggplot(DF, aes(x=xpos, y=ypos, z=thing2)) + stat_summary_hex(fun=function(x){x[which.max(x)]})

thing1 thing2

+8

r ggplot2

jflournoy Jun 28 '13 at 21:11

source share

1 answer

datanalytics.com · Answer 1 · 2013-12-08T22:06:55+0000

There are, as far as I know, two functions from R to hexbin: hexBinning and geom_hex in the fMultivar and ggplot2 packages, respectively. And both parameterize the centers of the hexagons in accordance with the coordinates of the lower lower left point in the sample.

This means that if you divide your sample (depending on the factor or, in my case, inside the mapreduce task), your hexagons will become eccentric.

So, I implemented my own hexbin function, which assumes (0,0) as the center of the grid (i.e. if there were points around (0,0), the corresponding hexagon would be centered there) and only requires r (the radius of the hexagon) as a parameter.

The implementation is here (sorry, Spanish text!). Moreover, my implementation has no explicit loops: it is fully vectorized.

In stat_summary_hex, why do hexagons overlap if z is a factor?

More articles: