The for loop adds only the last ggplot layer

Question

The for loop adds only the last ggplot layer

Summary. When I use the "for" loop to add layers to the violin plot (in ggplot), the only layer added is the one created by the last iteration of the loop. However, in explicit code that mimics the code generated by the loop, all layers are added.

Details: I am trying to create scripted graphs with overlapping layers to show how distributed assessments perform or do not overlap for several answers to the question of locally stratified surveys. I want to be able to include any number of places, so I have one column using a dataframe for each place, and I'm trying to use the "for" loop to create one ggplot layer for each place. But the loop only adds a layer from the final iteration of the loop.

This code illustrates the problem and some suggested approaches that did not execute:

library(ggplot2) # Create a dataframe with 500 random normal values for responses to 3 survey questions from two cities topic <- c("Poverty %","Mean Age","% Smokers") place <- c("Chicago","Miami") n <- 500 mean <- c(35, 40,58, 50, 25,20) var <- c( 7, 1.5, 3, .25, .5, 1) df <- data.frame( topic=rep(topic,rep(n,length(topic))) ,c(rnorm(n,mean[1],var[1]),rnorm(n,mean[3],var[3]),rnorm(n,mean[5],var[5])) ,c(rnorm(n,mean[2],var[2]),rnorm(n,mean[4],var[4]),rnorm(n,mean[6],var[6])) ) names(df)[2:dim(df)[2]] <- place # Name those last two columns with the corresponding place name. head(df) # This "for" loop seems to only execute the final loop (ie, where p=3) g <- ggplot(df, aes(factor(topic), df[,2])) for (p in 2:dim(df)[2]) { g <- g + geom_violin(aes(y = df[,p], colour = place[p-1]), alpha = 0.3) } g # But mimicing what the for loop does in explicit code works fine, resulting in both "place"s being displayed in the graph. g <- ggplot(df, aes(factor(topic), df[,2])) g <- g + geom_violin(aes(y = df[,2], colour = place[2-1]), alpha = 0.3) g <- g + geom_violin(aes(y = df[,3], colour = place[3-1]), alpha = 0.3) g ## per http://stackoverflow.com/questions/18444620/set-layers-in-ggplot2-via-loop , I tried g <- ggplot(df, aes(factor(topic), df[,2])) for (p in 2:dim(df)[2]) { df1 <- df[,c(1,p)] g <- g + geom_violin(aes(y = df1[,2], colour = place[p-1]), alpha = 0.3) } g # but got the same undesired result # per http://stackoverflow.com/questions/15987367/how-to-add-layers-in-ggplot-using-a-for-loop , I tried g <- ggplot(df, aes(factor(topic), df[,2])) for (p in names(df)[-1]) { cat(p,"\n") g <- g + geom_violin(aes_string(y = p, colour = p), alpha = 0.3) # produced this error: Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0 # g <- g + geom_violin(aes_string(y = p ), alpha = 0.3) # produced this error: Error: stat_ydensity requires the following missing aesthetics: y } g # but that failed to produce any graphic, per the errors noted in the "for" loop above

+10

for-loop r ggplot2

user3799203 Oct 7 '14 at 12:18

source share

3 answers

Just avoid using a for loop. How about lapply instead:

 g <- g + lapply(2:ncol(df), function(p) { geom_violin(aes(y = df[,p], colour = place[p-1]), alpha = 0.3) })

EDIT: That really doesn't work. I had p <- 2 in my workspace before starting it, and then he created a graph with only Chicago data. Anyway, the principle should still work (although melt is probably the best option):

 g <- ggplot(df, aes(x=factor(topic))) g + lapply(place, function(p) { geom_violin(aes_string(y = p), alpha = 0.3, color = which(p==place)) })

+3

shadow Oct 7 '14 at 12:38

source share

You can do this without a loop:

 df.2 <- melt(df) gg <- ggplot(df.2, aes(x=topic, y=value)) gg <- gg + geom_violin(position="identity", aes(color=variable), alpha=0.3) gg

+2

hrbrmstr Oct 7 '14 at 13:45

source share

jlhoward · Accepted Answer · 2014-10-07T23:15:04+0000

The reason this happens is ggplot "lazy rating". This is a common problem when ggplot used in this way (making layers separately in a loop, instead of having ggplot for you, as in the @hrbrmstr solution).

ggplot stores aes(...) arguments as expressions and evaluates them only when rendering the graph. So, in your loops, something like

 aes(y = df[,p], colour = place[p-1])

It is stored as is and evaluated when rendering the graph after the completion of the cycle. At this point, p = 3, so all graphs are displayed with p = 3.

So the “right” way to do this is to use melt(...) in the reshape2 package to convert your data from wide format and let ggplot manage the layers for you. I put “correctly” in quotation marks, because in this particular case there is subtlety. When calculating distributions for the violin using a frame of molten data, ggplot uses the total amount (for both Chicago and Miami) as a scale. If you want the violins to scale individually in frequency, you need to use loops (unfortunately).

The way around the lazy evaluation problem is to reference the loop index in the definition of data=... This is not saved as an expression, the actual data is stored in the schedule definition. So you can do this:

 g <- ggplot(df,aes(x=topic)) for (p in 2:length(df)) { gg.data <- data.frame(topic=df$topic,value=df[,p],city=names(df)[p]) g <- g + geom_violin(data=gg.data,aes(y=value, color=city)) } g

which gives the same result as yours. Note that the p index does not appear in aes(...) .

Update: note on scale="width" (mentioned in the comment). This leads to the fact that all the violins have the same width (see below), which is not the same scaling as in the OP source code. IMO is not a great way to visualize data, as it suggests that the Chicago group has a lot more data.

 ggplot(gg) +geom_violin(aes(x=topic,y=value,color=variable), alpha=0.3,position="identity",scale="width")

The for loop adds only the last ggplot layer

More articles: