Problems with ggplot and pgfSweave

I started using Sweave some time ago. However, like most people, I pretty soon ran into a serious problem: speed. Wetting a large document takes time to start, which makes efficient work a rather difficult task. Cache can significantly speed up data processing. However, the plots - especially ggplot;) - still take too much time to render. That I want to use pgfSweave.

After many, many hours, I finally managed to create a working system with Eclipse / StatET / Texlipse. Then I wanted to convert the existing report for use with pgfSweave and had a nasty surprise: most of my ggplots seem to no longer work. The following example, for example, works fine in the console and Sweave:

pl <- ggplot(plot_info,aes(elevation,area)) pl <- pl + geom_point(aes(colour=que_id)) print(pl) 

By running it with pgfSweave, I get this error:

 Error in if (width > 0) { : missing value where TRUE/FALSE needed In addition: Warning message: In if (width > 0) { : the condition has length > 1 and only the first element will be used Error in driver$runcode(drobj, chunk, chunkopts) : Error in if (width > 0) { : missing value where TRUE/FALSE needed 

When I remove aes (...) from geom_point, the plot works fine with pgfSweave.

 pl <- ggplot(plot_info,aes(elevation,area)) pl <- pl + geom_point() print(pl) 

Edit: I investigated the problem more and could reduce the problem to a tikz device.

This works great:

 quartz() pl <- ggplot(plot_info,aes(elevation,area)) pl <- pl + geom_point(aes(colour=que_id)) print(pl) 

This gives the error above:

 tikz( 'myPlot.tex',standAlone = T ) pl <- ggplot(plot_info,aes(elevation,area)) pl <- pl + geom_point(aes(colour=que_id)) print(pl) dev.off() 

This works great:

 tikz( 'myPlot.tex',standAlone = T ) pl <- ggplot(plot_info,aes(elevation,area)) pl <- pl + geom_point() print(pl) dev.off() 

I could repeat this with 5 different ggplots. If the display does not use color (or size, alpha, ...), it works with tikz.

Q1: Does anyone have an explanation for this behavior?

In addition, caching snippets of code without a section does not work very well. The following code snippet does not require time on Sweave. With pgfSweave, it takes about 10 seconds.

 <<plot.opts,echo=FALSE,results=hide,cache=TRUE>>= #colour and plot options are globally set pal1 <- brewer.pal(8,"Set1") pal_seq <- brewer.pal(8,"YlOrRd") pal_seq <- c("steelblue1","tomato2") opt1 <- opts(panel.grid.major = theme_line(colour = "white"),panel.grid.minor = theme_line(colour = "white")) sca_fill_cont_opt <- scale_fill_continuous(low="steelblue1", high="tomato2") ory <- geom_hline(yintercept=0,alpha=0.4,linetype=2) orx <- geom_vline(xintercept=0,alpha=0.4,linetype=2) ts1 <- 2.3 ts2 <- 2.5 ts3 <- 2.8 ps1 <- 6 offset_x <- function(x,y) 0.15*x/pmax(abs(x),abs(y)) offset_y <- function(x,y) 0.05*y/pmax(abs(x),abs(y)) plot_size <- 50*50 

This is also a rather strange behavior, since only some variables are set for later use.

Q2: Does anyone have an explanation?

Q3: More generally, I would like to ask if anyone is using pgfSweave successfully? With successful, I mean that all the things that work in Sweave also work in pgfSweave, with the added benefit of good fonts and improved speed .;)

Thanks so much for the answers!

+6
r sweave ggplot2 latex tikz
source share
3 answers

Q1: Does anyone have an explanation for this behavior?

These are the three reasons tikzDevice gives an error when trying to build your plot:

  • When you add an aesthetic mapping that creates a legend, such as aes(colour=que_id) , ggplot2 will use the variable name as the name of the legend --- in this case, que_id.

  • tikzDevice passes all strings, such as legend headings, to LaTeX for typing.

  • In LaTeX, the underscore _ used to indicate an index. If the underscore is used outside of math mode, it causes an error.

When tikzDevice tries to calculate the height and width of the legend title, "que_id", it passes the string to LaTeX for typing and expects LaTeX to return the width and height of the string. LaTeX suffers from an error because the unescaped underscore character is used in a line outside mathmode. TikzDevice gets NULL for the width of the string instead of the number that causes the if (width > 0) check.

Ways to Avoid a Problem

  • Indicate the name of the legend to be used by adding a color scale:

     p1 <- ggplot(plot_info, aes(elevation, area)) p1 <- p1 + geom_point(aes(colour=que_id)) # Add a name that is easier for humans to read than the variable name p1 <- p1 + scale_colour_brewer(name="Que ID") # Or, replace the underscore with the appropriate LaTeX escape sequence p1 <- p1 + scale_colour_brewer(name="que\\textunderscore id") 
  • Use the line disinfection function introduced in tikzDevice 0.5.0 (but was broken before 0.5.2). Currently, line disinfection will only result in the following characters: % , $ , { , } and ^ by default. However, you can specify additional substitution pairs using the tikzSanitizeCharacters and tikzReplacementCharacters :

     # Add underscores to the sanitization list options(tikzSanitizeCharacters = c('%','$','}','{','^', '_')) options(tikzReplacementCharacters = c('\\%','\\$','\\}','\\{', '\\^{}', '\\textunderscore')) # Turn on string sanitization when starting the plotting device tikz('myPlot.tex', standAlone = TRUE, sanitize = TRUE) print(p1) dev.off() 

We will release version 0.5.3 of tikzDevice in the next couple of weeks to look at some annoying warning messages that now appear due to changes in the way R manages system() . I will add the following changes to the next version:

  • The best warning message is when width NULL indicates that something is possibly wrong with the plot text.

  • Add underscores and a few other characters to the default character set that searches for the line sanitizer.

Hope this helps!

+4
source share

Q2: I support pgfsweave.

Here are the results of the test that I performed:

 time R CMD Sweave time-test.Rnw real 0m1.133s user 0m1.068s sys 0m0.054s time R CMD pgfsweave time-test.Rnw real 0m2.941s user 0m2.413s sys 0m0.364s time R CMD pgfsweave time-test.Rnw real 0m2.457s user 0m2.112s sys 0m0.283s 

I believe that there are two reasons for the time difference, but checking them will require more work:

  • pgfSweave performs a ton of validation and double validation to ensure that it does not redo the costly calculations. The goal is to make it possible to carry out more expensive calculations and build in a document. The โ€œexpensiveโ€ scale in this case is much larger than the extra seconds or two to do the checks.

As an example of caching, consider the following test file to see the real benefits of caching:

 \documentclass{article} \begin{document} <<plot.opts,cache=TRUE>>= x <- Sys.sleep(10) @ \end{document} 

And the results:

 time R CMD Sweave time-test2.Rnw real 0m10.334s user 0m0.283s sys 0m0.047s time R CMD pgfsweave time-test2.Rnw real 0m12.032s user 0m1.356s sys 0m0.349s time R CMD pgfsweave time-test2.Rnw real 0m1.423s user 0m1.121s sys 0m0.266s 
  • Sweave has undergone some changes in R 2.12. The changes may have accelerated the process of evaluating the code fragment and left pgfSweave behind these smaller calculations. Worth a look at

Q3: I use pgfSweave all the time for my work. There have been some changes in Sweave in R 2.12 that cause some minor problems with pgfSweave, but a new version appears that fixes everything. The development version on github ( https://github.com/cameronbracken/pgfSweave ) already has its changes. If you have any additional problems, I would be happy to help.

+3
source share

Q2: Do you use \pgfrealjobname{<DOCUMENTNAME>} in the header and the external=TRUE option for graphic fragments? I found that this increases speed (not for the first compilation, but for subsequent ones if the graphics have not changed). You will find more background in the pgfSweave vignette.

Q3: Everything works fine for me, I use Windows + Eclipse / StatEt / Texlipse, like you.

+1
source share

All Articles