Running multiple R scripts at the same time

In my thesis, I need to do a lot of simulation studies, which all take quite a lot of time. My computer has 4 cores, so I was wondering if it is possible to run two R-scripts in Rstudio at the same time, allowing them to use two different cores? If this could be done, I could save a lot of time just by leaving the computer overnight, running all of these scripts.

+16
r rstudio
source share
5 answers

EDIT : Given the improvements to RStudio, this method is no longer the best way to do this - see Tom Kelly's answer below.


Assuming that the results do not have to be in the same environment, you can achieve this with RStudio projects: https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects

First create two separate projects. You can open both at the same time, leading to two attempts. Then you can open each script in each project and execute each of them separately. Then in your OS you can control the distribution of the kernel.

+15
source share

In RStudio

If you right-click on RStudio, you can open several separate "sessions" of RStudio (regardless of whether you use Projects or not). By default, they will use 1 core each.

Update (July 2018): RStudio v1.2.830-1, available as a preliminary release, supports the Jobs panel. This is about running R scripts in the background separately from an interactive R session:

  • Run any R-script as a background job in a pure R-session

  • Monitoring progress and viewing script output in real time

  • If you wish, provide the tasks with a global environment at startup and export the values ​​back after completion

This will be available in RStudio version 1.2.

Running scripts in the terminal

If you have several scripts that, as you know, run without errors, I would recommend running them with various parameters through the command line:

RCMD script.R RScript script.R R --vanilla < script.R 

Running in the background:

 nohup Rscript script.R & 

Here "&" runs the script in the background (it can be obtained with fg , tracked with htop and killed with kill <pid> or pkill rsession ), and nohup saves the output to a file and continues to work if the terminal is closed,

Passing arguments to a script:

 Rscript script.R 1 2 3 

This will pass c(1, 2, 3) to R as the output of commandArgs() so that a loop in bash can run multiple instances of Rscript with a bash loop:

 for ii in 1 2 3 do nohup Rscript script.R $ii & done 

Running parallel code in R

You will often find that a certain step in your R-script slows down the calculations, can I suggest running parallel code in your R-code rather than running them separately? I would recommend the snow package to run parallel loops in R. Usually instead of using:

 cl <- makeCluster(n) # n = number of cores (I'd recommend one less than machine capacity) clusterExport(list=ls()) #export input data to all cores output_list <- parLapply(cl, input_list, function(x) ... ) stopCluster() # close cluster when complete (particularly on shared machines) 

Use this wherever you usually use the lapply function in R to run in parallel.

+14
source share

You can get multi-core parallelism (as described here https://cran.r-project.org/web/packages/doMC/vignettes/gettingstartedMC.pdf ) in the same session with the following code

 if(Sys.info()["sysname"]=="Windows"){ library(doParallel) cl<-makeCluster(numberOfCores) registerDoParallel(cl) }else{ library(doMC) registerDoMC(numberOfCores) } library(foreach) someList<-list("file1","file2") returnComputation <- foreach(x=someList) %dopar%{ source(x) } if(Sys.info()["sysname"]=="Windows") stopCluster(cl) 

You will need to adapt your output.

+4
source share

All you have to do (assuming you are using Unix / Linux) is to run the R batch command and put it in the background. This will automatically allocate it to the processor.

In the shell, do:

 /your/path/$ nohup R CMD BATCH --no-restore my_model1.R & /your/path/$ nohup R CMD BATCH --no-restore my_model2.R & /your/path/$ nohup R CMD BATCH --no-restore my_model3.R & /your/path/$ nohup R CMD BATCH --no-restore my_model4.R & 

executes commands, saves the printout in the my_model1.Rout file and saves all created R objects in the .RData file. This will run each model on different processors. Session run and output will be placed in the output files.

If you do this over the Internet, through the terminal, you will need to use the nohup command. Otherwise, when you exit the session, the processes will be terminated.

 /your/path/$ nohup R CMD BATCH --no-restore my_model1.R & 

If you want to give processes a low priority, you do:

 /your/path/$ nohup nice -n 19 R CMD BATCH --no-restore my_model.R & 

It is best to include some code at the beginning of the script to load and attach the corresponding data file.

NEVER just do it

 /your/path/$ nohup R CMD BATCH my_model1.R & 

This will ruin the .RData file (all funny objects are there too) and seriously undermine reproducibility. I.e

 --no-restore 

or

 --vanilla 

your dear friends.

If you have too many models, I suggest doing the calculations on a cloud account, as you may have more CPU and RAM. Depending on what you are doing and on the R package, models may take hours on current hardware.

I didn’t learn this easily, but there is a good document here:

http://users.stat.umn.edu/~geyer/parallel/parallel.pdf

NTN.

+2
source share

If you want to draw an embarrassing parallel, you can open as many terminals as you want on the terminal tab (located immediately after the console tab) and run your code using Rscript yourcode.R . Each code will run on a separate kernel by default. If necessary, you can also use the command line argument (as mentioned in @Tom Kelly).

+1
source share

All Articles