Plot / ggplot2 - Fill an area with too many points

Final implementation - not finished, but lights up in the right way

Idea / Problem: You have a plot with many overlapping points and you want to replace them with a simple area, so increase productivity by viewing the plot.

Possible implementation: Calculate the matrix of distances between all points and connect all points below a given distance.

Tolo / Not completed:. Currently, it works for manually set distances depending on the size of the printing area. I stayed here because the result did not fit my aesthetic sense.

Minimal example with intermediate charts

set.seed(074079089) n.points <- 3000 mat <- matrix(rnorm(n.points*2, 0,0.2), nrow=n.points, ncol=2) colnames(mat) <- c("x", "y") d.mat <- dist(mat) fit.mat <-hclust(d.mat, method = "single") lims <- c(-1,1) real.lims <- lims*1.1 ## ggplot invokes them approximately # An attempt to estimate the point-sizes, works for default pdfs pdf("test.pdf") cutsize <- sum(abs(real.lims))/100 groups <- cutree(fit.mat, h=cutsize) # cut tree at height cutsize # plot(fit.mat) # display dendogram # draw dendogram with red borders around the 5 clusters # rect.hclust(fit.mat, h=cutsize, border="red") library(ggplot2) df <- data.frame(mat) df$groups <- groups plot00 <- ggplot(data=df, aes(x,y, col=factor(groups))) + geom_point() + guides(col=FALSE) + xlim(lims) + ylim(lims)+ ggtitle("Each color is a group") pdf("plot00.pdf") print(plot00) dev.off() 

plot00 - points with group color

 # If less than 4 points are connected, show them seperately t.groups <- table(groups) # how often which group drop.group <- as.numeric(names(t.groups[t.groups<4])) # groups with less than 4 points are taken together groups[groups %in% drop.group] <- 0 # in group 0 df$groups <- groups plot01 <- ggplot(data=df, aes(x,y, col=factor(groups))) + geom_point() + xlim(lims)+ ylim(lims) + scale_color_hue(l=10) pdf("plot01.pdf") print(plot01) dev.off() 

plot01 - all individual points in one group

 find_hull <- function(df_0) { return(df_0[chull(df_0$x, df_0$y), ]) } library(plyr) single.points.df <- df[df$groups == 0 , ] connected.points.df <- df[df$groups != 0 , ] hulls <- ddply(connected.points.df, "groups", find_hull) # for all groups find a hull plot02 <- ggplot() + geom_point(data=single.points.df, aes(x,y, col=factor(groups))) + xlim(lims)+ ylim(lims) + scale_color_hue(l=10) pdf("plot02.pdf") print(plot02) dev.off() 

plot02 - only

 plot03 <- plot02 for(grp in names(table(hulls$groups))) { plot03 <- plot03 + geom_polygon(data=hulls[hulls$groups==grp, ], aes(x,y), alpha=0.4) } # print(plot03) plot01 <- plot01 + theme(legend.position="none") plot03 <- plot03 + theme(legend.position="none") # multiplot(plot01, plot03, cols=2) pdf("plot03.pdf") print(plot03) dev.off() 

plot03 - final

Starting question

I have a (maybe odd) question.

In some stories, I have thousands of points in my analysis. To display them, the computer takes a lot of time, because there are so many of them. After that, many of these points can overlap, I have a filled area (this is normal!). To save time / effort, it would be useful to simply fill in this area, but the construction of each point in itself.

I know that there are possibilities in heatmaps and so on, but this is not the idea that I have in mind. My idea is something like:

 #plot00: ggplot with many many points and a filled area of points plot00 <- plot00 + fill.crowded.areas() # with plot(), I sadly have an idea how to manage it 

Any ideas? Or can it do nothing at any time?

 # Example code # install.packages("ggplot2") library(ggplot2) n.points <- 10000 mat <- matrix(rexp(n.points*2), nrow=n.points, ncol=2) colnames(mat) <- c("x", "y") df <- data.frame(mat) plot00 <- ggplot(df, aes(x=x, y=y)) + theme_bw() + # white background, grey strips geom_point(shape=19)# Aussehen der Punkte print(plot00) 

ggplot2

 # NO ggplot2 plot(df, pch=19) 

plot

Edit:
In order to have dense patches, as mentioned by fdetsch (how can I tag a name?), There are some questions regarding this topic. But that is not what I want for sure. I know that my concern is a little strange, but densities make the plot more busy sometimes as needed.

Links to topics with densities:

Scatterplot with too many dots
High Density Charts

+6
source share
3 answers

You can use a reliable estimate to estimate the location of most of your points and plot the convex hull of the points as follows:

 set.seed(1337) n.points <- 500 mat <- matrix(rexp(n.points*2), nrow=n.points, ncol=2) colnames(mat) <- c("x", "y") df <- data.frame(mat) require(robustbase) my_poly <- function(data, a, ...){ cov_rob = covMcd(data, alpha = a) df_rob = data[cov_rob$best,] ch = chull(df_rob$x, df_rob$y) geom_polygon(data = df_rob[ch,], aes(x,y), ...) } require(ggplot2) ggplot() + geom_point(data=df, aes(x,y)) + my_poly(df, a = 0.5, fill=2, alpha=0.5) + my_poly(df, a = 0.7, fill=3, alpha=0.5) 

This leads to:

enter image description here

covMcd controlling the alpha value of covMcd , you can increase / decrease the size of the area. See ?robustbase::covMcd . Btw .: Mcd stands for minimal covariance determinant. Instead, you can also use MASS::cov.mve to calculate the minimum valum ellipsoid with MASS::cov.mve(..., quantile.used= -person of the points in the ellipsoid.

For classes 2+:

 my_poly2 <- function(data, a){ cov_rob = covMcd(data, alpha = a) df_rob = data[cov_rob$best,] ch = chull(df_rob[,1], df_rob[,2]) df_rob[ch,] } ggplot(faithful, aes(waiting, eruptions, color = eruptions > 3)) + geom_point() + geom_polygon(data = my_poly2(faithful[faithful$eruptions > 3,], a=0.5), aes(waiting, eruptions), fill = 2, alpha = 0.5) + geom_polygon(data = my_poly2(faithful[faithful$eruptions < 3,], a=0.5), aes(waiting, eruptions), fill = 3, alpha = 0.5) 

enter image description here

Or, if you're fine with unresponsive ellipsoids, check out stat_ellipse

+1
source

How about using panel.smoothScatter from the grid ? It displays a certain number of points in regions of low density (see the "nrpoints" argument) and wherever point densities are displayed, rather than single (and possibly overlapping) points, which gives more meaningful information about your data. See Also ?panel.smoothScatter for more information.

 ## load 'lattice' library(lattice) ## display point densities xyplot(y ~ x, data = df, panel = function(x, y, ...) { panel.smoothScatter(x, y, nbin = 250, ...) }) 

enter image description here

+4
source

You mean something like a convex body of your points:

Convex hull of points

 set.seed(1337) n.points <- 100 mat <- matrix(rexp(n.points*2), nrow=n.points, ncol=2) colnames(mat) <- c("x", "y") df <- data.frame(mat) ch <- chull(df$x, df$y) # This computes the convex hull require(ggplot2) ggplot() + geom_point(data=df, aes(x,y)) + geom_polygon(data = df[ch,], aes(x,y), alpha=0.5) 
+1
source

All Articles