Writing a loop to create ggplot shapes with various data sources and names

Question

Writing a loop to create ggplot shapes with various data sources and names

I have no experience with loops, but it looks like I will need to create some of them to properly analyze my data. Could you show how to create a simple code loop that I already created? Let me use a loop to get some graphs:

pdf(file = sprintf("complex I analysis", tbl_comp_abu1), paper='A4r') ggplot(df_tbl_data1_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) + theme(legend.title=element_blank()) + geom_line(aes(color=factor(Gene_Name))) + ggtitle("Data1 - complex I")+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) ggplot(df_tbl_data2_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) + theme(legend.title=element_blank()) + geom_line(aes(color=factor(Gene_Name))) + ggtitle("Data2 - complex I")+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) ggplot(df_tbl_data3_comp1, aes(Size_Range, Abundance, group=factor(Gene_Name))) + theme(legend.title=element_blank()) + geom_line(aes(color=factor(Gene_Name))) + ggtitle("Datas3 - complex I")+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) dev.off()

Now the question is what I would like to achieve. Therefore, first of all, I have 10 complexes for analysis, so I need to create 10 PDF files, and the example shows graphs from three different data sets for complex. For proper operation, the number in the variable comp1 (from df_tbl_dataX_comp1 ) must vary from 1 to 10 - it depends on which complex we want to build. The next thing that needs to be changed through the loop is the name of the pdf file and each of the graphs ... Can such a loop be written?

Data:

 structure(list(Size_Range = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L, 18L, 18L, 18L, 19L, 19L, 19L, 20L, 20L, 20L), .Label = c("10", "34", "59", "84", "110", "134", "165", "199", "234", "257", "362", "433", "506", "581", "652", "733", "818", "896", "972", "1039" ), class = "factor"), Abundance = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 142733.475, 108263.525, 98261.11, 649286.165, 3320759.803, 3708515.148, 6691260.945, 30946562.92, 180974.3725, 4530005.805, 21499827.89, 0, 15032198.54, 4058060.583, 0, 3842964.97, 2544030.857, 0, 1640476.977, 286249.1775, 0, 217388.5675, 1252965.433, 0, 1314666.05, 167467.8825, 0, 253798.15, 107244.9925, 0, 207341.1925, 15755.485, 0, 71015.85, 14828.5075, 0, 25966.2325, 0, 0, 0, 0, 0, 0), Gene_Name = c("AT1G01080", "AT1G01090", "AT1G01320", "AT1G01420", "AT1G01470", "AT1G01560", "AT1G01800", "AT1G02150", "AT1G02500", "AT1G02560", "AT1G02780", "AT1G02880", "AT1G02920", "AT1G02930", "AT1G03030", "AT1G03090", "AT1G03110", "AT1G03130", "AT1G03220", "AT1G03230", "AT1G03330", "AT1G03475", "AT1G03630", "AT1G03680", "AT1G03870", "ATCG00420", "ATCG00470", "ATCG00480", "ATCG00490", "ATCG00500", "ATCG00650", "ATCG00660", "ATCG00670", "ATCG00740", "ATCG00750", "ATCG00842", "ATCG01100", "ATCG01030", "ATCG01114", "ATCG01665", "ATCG00770", "ATCG00780", "ATCG00800", "ATCG00810", "ATCG00820", "ATCG00722", "ATCG00744", "ATCG00855", "ATCG00853", "ATCG00888", "ATCG00733", "ATCG00766", "ATCG00812", "ATCG00821", "ATCG00856", "ATCG00830", "ATCG00900", "ATCG01060", "ATCG01110", "ATCG01120")), .Names = c("Size_Range", "Abundance", "Gene_Name" ), row.names = c(NA, -60L), class = "data.frame")

+7

r ggplot2

Shaxi liver Oct 21 '15 at 13:09

source share

3 answers

Therefore, after my answer, I realized that it does not address the actual question of loops. However, I hope this shows you another way to get closer to your root problem (aka, I did not want the work to be wasted).

I could not get your plot to work with the data you published. The 60-line data frame contains 60 unique gene names. When you try to make geom_line and group by gene ( aes(group=Gene_name) ), you will only have one dot for each row. You need two points to make a line.

I compiled some data and did the analysis.

 # Function to generate random data generate_data = function() { require(truncnorm) require(dplyr) gene_names = LETTERS[1:20] n_genes = length(gene_names) size_ranges = c(10, 34, 59, 84, 110, 134, 165, 199, 234, 257, 362, 433, 506, 581, 652, 733, 818, 896, 972, 1039) gene_size_means = rtruncnorm(n_genes, 10, 1000, 550, 300) genes_in_complex = rbinom(n_genes, 1, 0.3) true_variance = 50 gene_size_variances = rchisq(n_genes, n_genes-1) * (true_variance/(n_genes-1)) df = data.frame(gene_name=gene_names, gene_mean=gene_size_means, gene_var=gene_size_variances, in_complex=genes_in_complex) df = df %>% group_by(gene_name) %>% do(data.frame(size_ranges, abundance=dnorm(size_ranges, .$gene_mean, .$gene_var)*.$in_complex)) return(df) } # Generate a list of tables. Each table is for one data set for one complex data_tables = list() n_comps = 3 for( complex_i in 1:2 ) { for( comp_j in 1:n_comps ) { loop_df = generate_data() loop_df$comp = comp_j loop_df$complex = complex_i data_tables = c(data_tables, list(loop_df)) } } # Concatenate the tables into a larger data frame dat = do.call(rbind, data_tables) # Make a plots for each data set for complex 1 dat_complex1 = subset(dat, complex==1) p = ggplot(dat_complex1, aes(x=size_ranges, y=abundance, color=gene_name, group=gene_name)) + geom_line() + facet_wrap(~comp, ncol=1) print(p) # Make a plot with many subpanels for all complexes and data sets p %+% dat + facet_grid(comp~complex) # screenshot shown below

So, are you studying protein complexes in Arabidopsis? If someone is familiar with your domain, offering a background can help them answer your question. Alternatively, an image of the desired output may help. In addition, some more complete data examples and / or screenshots may cause increased interest in your future posts.

+2

kdauria Oct 28 '15 at 5:03

source share

Take a look at this approach. It depends on data.frame ( dat ), which contains the names of your datasets, chart headers, as well as file names.

First, I create a function that creates a graph and saves it, then I call the function in for -loop, as well as in apply -loop (use as much as possible faster).

The code is as follows:

 # create a custom function for ggplot, # which creates the plot and then saves it as a pdf custom_ggplot_function <- function(input.data.name, graph.title, f.name){ # get(input.data.name) gets you the variable which is stored as a string in # input.data.name p <- ggplot(get(input.data.name), aes(Size_Range, Abundance, group=factor(Gene_Name))) + theme(legend.title=element_blank()) + geom_line(aes(color=factor(Gene_Name))) + ggtitle(graph.title)+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) ggsave(filename = paste0(f.name, ".pdf"), plot = p) NULL } # dat contains the names of your datasets, the titles of the graphs and filenames dat <- data.frame(df.names = c("df_tbl_data1_comp1", "df_tbl_data2_comp1"), graph.titles = c("Data1 - Complex I", "Data2 - Complex II"), file.names = c("file1", "file2")) # If you create your data.frame dat, you can also say # df.names = paste0("df_tbl_data", 1:10, "_comp1") and # graph.titles = paste0("Data", 1:10, " - Complex ", 1:10) # loop through the rows of dat for (i in 1:nrow(dat)) { custom_ggplot_function(input.data.name = dat[i, "df.names"], graph.title = dat[i, "graph.titles"], f.name = dat[i, "file.names"]) } # or using the apply function apply(dat, 1, function(row.el) { custom_ggplot_function(input.data.name = row.el["df.names"], graph.title = row.el["graph.titles"], f.name = row.el["file.names"]) })

+1

David Oct 30 '15 at 9:57

source share

maRtin · Accepted Answer · 2015-10-21T13:37:51+0000

This can do the trick: Initiate two loops, one for complex iteration and one for iterating the dataset. Then use paste0() or paste() to create the correct file names and headers.

PS: I have not tested the code since I have no data. But that should give you an idea.

 #loop over complex for (c in 1:10) { #create pdf for every complex pdf(file = paste0("complex", c, "analysis.pdf"), paper='A4r') #loop over datasets for(d in 1:3) { #plot ggplot(get(paste0("df_tbl_data",d,"_comp",c)), aes(Size_Range, Abundance, group=factor(Gene_Name))) + theme(legend.title=element_blank()) + geom_line(aes(color=factor(Gene_Name))) + ggtitle(paste0("Data",d," - complex ",c))+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) } dev.off() }

Writing a loop to create ggplot shapes with various data sources and names

More articles: