What data structure should be used that can be added individually?

I have to load data from files associated with several experiments and process them to generate a graph. Each experiment generated several files. The files associated with experiment 1 will have their name Experiment1, and then added by data type, containing: Experiment1-per0, Experiment1-per50, Experiment1 for 100.

These postfix are fixed for all experiments. Therefore, in order to upload files, I want to give only the names of the experiment, and the latter add a postfix to the R-script. Therefore, for each experimental name “ExperimentX” that I would give, I will upload three separate data files by adding postfixes (ie “ExperimentX-per0”, “ExperimentX-per50”, “ExperimentX-per100”)

I cannot figure out in what datastructure I should store the initial names of the experiment, and then the names marked.

Example file (Experiment1-per50):

# the last column also shows the type of data ie postfix of file Obj TGiven TUsed TOGiven TOServed per50 16570 8 7 12 6 per50 18430 8 8 12 9 per50 16890 8 7 12 9 per50 

Currently, I put each file name manually, which takes a lot of time.

+4
source share
2 answers

If each experiment has the same set of suffixes, you can save the list of experiment names and suffix names separately. Then, using a nested loop, you can combine the experiment name and suffix name using the paste function to get the file name.

The code might look something like this:

 experiments = c("Experiment1","Experiment2","Experiment3") suffixes = c("per0","per50","per100") for (experiment in experiments) { for (suffix in suffixes) { filename <- paste(experiment, suffix, sep="-") df <- read.table(filename) df$experiment <- experiment # Do something with the dataframe here } } 

Alternatively, if you only need a vector of all the file names from the given lists of experiments and suffixes , this will combine them:

 as.vector(sapply(experiments, paste, suffixes, sep="-")) 
+2
source

If all columns are different

If the columns differ from each other between experiments, I would wrap the experiments in the lists as follows:

 library(plyr); experiments <- c("Experiment1","Experiment2","Experiment3"); suffixes <- c("per0","per50","per100"); # if you want to go ahead and get the data data <- llply( experiments, function(experiment) { llply( suffixes, function(suffix) { fn <- str_c(experiment,'_',suffix,'.csv'); # make filename # later, try to read fn, now just return return(fn); }) }) 

Then you can iterate through data for further processing. llply is part of the plyr package. It iterates over the list (first l in llply ) and returns a list (second l ).

If all columns are the same

 library(plyr); experiments <- c("Experiment1","Experiment2","Experiment3"); suffixes <- c("per0","per50","per100"); data <- ldply( experiments, function(experiment) { ldply( suffixes, function(suffix) { data.frame( experiment = experiment, suffix= suffix, fn = str_c(exper.name,'_',suffix,'.csv')) }) }) 

This will read all the data as one data.frame , which you can then analyze as needed (e.g. using plyr and / or subset ).

+1
source

All Articles