Problem in passing variable with dollar sign ($) to aes () in combination with facet_grid () or facet_wrap ()

I am doing some analysis in ggplot2 at the moment for a project, and by chance I came across some (for me) strange behavior that I can not explain. When I write aes(x = cyl,...) graph looks different than it does if I aes(x = mtcars$cyl,...) the same variable with aes(x = mtcars$cyl,...) . When I facet_grid(am ~.) Both graphs again facet_grid(am ~.) . The code below is modeled after the code in my project, which generates the same behavior:

 library(dplyr) library(ggplot2) data = mtcars test.data = data %>% select(-hp) ggplot(test.data, aes(x = test.data$cyl, y = mpg)) + geom_point() + facet_grid(am ~ .) + labs(title="graph 1 - dollar sign notation") ggplot(test.data, aes(x = cyl, y = mpg)) + geom_point()+ facet_grid(am ~ .) + labs(title="graph 2 - no dollar sign notation") 

Here is an image of graph 1:

graph 1 - dollar sign notation

Here is an image of graph 2:

graph 2 - no dollar sign notation

I found that I can work around this problem by using aes_string instead of aes and passing the variable names as strings, but I would like to understand why ggplot behaves this way. The problem also occurs with similar attempts with facet_wrap .

Thanks a lot for any help in advance! I am very uncomfortable if I do not understand what is right ...

+11
r r-faq ggplot2
source share
1 answer

tl; dr

Never use [ or $ inside aes() .


Consider this illustrative example, where the facet variable f intentionally in non-obvious order with respect to x

 d <- data.frame(x=1:10, f=rev(letters[gl(2,5)])) 

Now compare what happens to these two graphs,

 p1 <- ggplot(d) + facet_grid(.~f, labeller = label_both) + geom_text(aes(x, y=0, label=x, colour=f)) + ggtitle("good mapping") p2 <- ggplot(d) + facet_grid(.~f, labeller = label_both) + geom_text(aes(d$x, y=0, label=x, colour=f)) + ggtitle("$ corruption") 

enter image description here

We can get a better idea of ​​what happens if we look at the data.frame created inside ggplot2 for each panel,

  ggplot_build(p1)[["data"]][[1]][,c("x","PANEL")] x PANEL 1 6 1 2 7 1 3 8 1 4 9 1 5 10 1 6 1 2 7 2 2 8 3 2 9 4 2 10 5 2 ggplot_build(p2)[["data"]][[1]][,c("x", "PANEL")] x PANEL 1 1 1 2 2 1 3 3 1 4 4 1 5 5 1 6 6 2 7 7 2 8 8 2 9 9 2 10 10 2 

The second graph has a wrong mapping, because when ggplot creates data.frame for each panel, it selects the x values ​​in the "wrong" order.

This is due to the fact that using $ breaks the connection between the various displayed variables (ggplot should consider it an independent variable, which, as you know, can come from an arbitrary, unrelated source). Since the data.frame in this example is not ordered according to the coefficient f , a subset of the data. The frames used inside each panel are in the wrong order.

+26
source share

All Articles