Maximum graph points in R?

I encountered several situations where I want to draw more points than I really should have been - the main trick is that when I share my stories with people or embed them in documents, they take up too much space. It is very easy to randomly try rows in a data frame.

If I want a truly random sample for the plot, it's easy to say:

ggplot(x,y,data=myDf[sample(1:nrow(myDf),1000),]) 

However, I was wondering if there were more efficient (perfectly preserved) ways to indicate the number of points in the graph so that your actual data was accurately reflected in the graph. So here is an example. Suppose I draw something like a heavy tail distribution CCDF, for example.

 ccdf <- function(myList,density=FALSE) { # generates the CCDF of a list or vector freqs = table(myList) X = rev(as.numeric(names(freqs))) Y =cumsum(rev(as.list(freqs))); data.frame(x=X,count=Y) } qplot(x,count,data=ccdf(rlnorm(10000,3,2.4)),log='xy') 

This will lead to a graph in which the x and y axes are becoming denser. It would be ideal here to have fewer samples built for large x or y values.

Does anyone have any tips or suggestions to solve such problems?

Thanks, -e

+7
r plot
source share
4 answers

Here is one possible solution for a graph of downsampling along the x axis, if it is converted to a log. He logs the x-axis, rounds this amount and selects the average x value in this bin:

 downsampled_qplot <- function(x,y,data,rounding=0, ...) { # assumes we are doing log=xy or log=x group = factor(round(log(data$x),rounding)) d <- do.call(rbind, by(data, group, function(X) X[order(X$x)[floor(length(X)/2)],])) qplot(x,count,data=d, ...) } 

Using the definition of ccdf() above, we can compare the original plot of the CCDF distribution with the downsampling version:

 myccdf=ccdf(rlnorm(10000,3,2.4)) qplot(x,count,data=myccdf,log='xy',main='original') 

 downsampled_qplot(x,count,data=myccdf,log='xy',rounding=1,main='rounding = 1') 

 downsampled_qplot(x,count,data=myccdf,log='xy',rounding=0,main='rounding = 0') 

In PDF format, the original graph occupies 640K, and versions with a reduced selection occupy 20K and 8K, respectively.

+4
source share

I tend to use png files rather than vector graphics like pdf or eps for this situation. Files are much smaller, although you are losing resolution.

If this is a more common scatterplot, the use of translucent colors as well as the solution to the add-in problem also helps. For example,

 x <- rnorm(10000); y <- rnorm(10000) qplot(x, y, colour=I(alpha("blue",1/25))) 
+8
source share

In addition to Rob's suggestions, I like one function of the graph, since the "thinning" for you is hexbin ; example in the gallery R Graph .

+5
source share

I would either make image files (png or jpeg devices) like Rob , or I would make a 2D histogram. An alternative to the 2D histogram is a smoothed scatter chart , it makes a similar graphic, but has a smoother cut from dense to sparse areas of space.

If you've never seen an addictedtor , it's worth a look. It has very good graphics generated in R with images and sample code.

Here is a sample code from addictedtor :

2-d histogram:

 require(gplots) # example data, bivariate normal, no correlation x <- rnorm(2000, sd=4) y <- rnorm(2000, sd=1) # separate scales for each axis, this looks circular hist2d(x,y, nbins=50, col = c("white",heat.colors(16))) rug(x,side=1) rug(y,side=2) box() 

smoothscatter:

 library("geneplotter") ## from BioConductor require("RColorBrewer") ## from CRAN x1 <- matrix(rnorm(1e4), ncol=2) x2 <- matrix(rnorm(1e4, mean=3, sd=1.5), ncol=2) x <- rbind(x1,x2) layout(matrix(1:4, ncol=2, byrow=TRUE)) op <- par(mar=rep(2,4)) smoothScatter(x, nrpoints=0) smoothScatter(x) smoothScatter(x, nrpoints=Inf, colramp=colorRampPalette(brewer.pal(9,"YlOrRd")), bandwidth=40) colors <- densCols(x) plot(x, col=colors, pch=20) par(op) 
+2
source share

All Articles