Running large simulations on a desktop computer with R (tips & tricks)

Sun Oct 08, 2023 in general scientific computing, computing, R, parallelisation, brms

Let me briefly describe what I learned while attempting to run a large simulation on my desktop computer as I think it might be useful to others. Here is the problem: I was forced to run a simulation on my desktop computer because brms couldn’t be installed on my department’s high performance computing (HPC) cluster. My first realisation was: Don’t use Windows!

Problem 0: You cannot stop Windows from rebooting

Trying to get this simulation to finish, showed me how terrible of a working environment Windows actually is. The biggest problem is that Windows doesn’t give you a choice if you want to update your system or not. I wasted a couple of days trying to find out if I can disable automatic updates. However, it is just not possible. As a consequence, the computer sometimes just restarted without my approval (outside of “working hours” despite the fact that the simulation was still running). This infuriated me so much that I set-up my desktop machine to dual boot Windows & Ubuntu. If you don’t use Ubuntu (or different linux), I’d suggest you start.

Problem 1: The simulation takes several days but I also work on the computer

For this simulation I had use my desktop computer in my office. This is the computer that I actually work on. So, I had to come up with a way to pause and restart the simulation whenever I have to use the computer for other stuff. This is done relatively easily. Whenever I have to use the computer for something else, I just press ESCAPE (for instance when running it via RStudio) or in my case use CTRL + C when running it via console. This allowed me to stop and restart the simulation whenever the need arises.

Here are the most important steps:

Use tryCatch().
Add a function that saves the progress in case an error occurs.
Use parallelisation if it possible (more on this later).
Add save points at which the progress is automatically saved if you don’t it for each iteration.
Optional: Predict the time when the whole thing is supposed to finish.

# /* 
# ----------------------------- Prepare cluster ---------------------------
# */
my.cluster <- parallel::makeCluster(detectCores() - 1, type = "PSOCK")
#my.cluster <- parallel::makeCluster(detectCores() - 2, type = "FORK")
doParallel::registerDoParallel(cl = my.cluster)

# /* 
# ----------------------------- Run simulation  ---------------------------
# */
# Get time at the start
startTime <- Sys.time()

# Main loop running through the number of simulations
tryCatch(
  for(rowIndex in startIndex:nSim){
	# Run chunks in parallel
    tempTest <- foreach(i = 1:nTests, .combine = "c", .packages = c('polspline', 'brms')) %dopar% {
      simulationFunction(36, 0)
    }
    
    # Check if it is a save point, then save the data to make sure progress is
    # saved regularly
    if(rowIndex %in% savePoints){
      exit_loop_gracefully(fileName, startIndex, rowIndex)
    }
    
    # Print & visualise predicted finish
    predicted_finish(startIndex, rowIndex, nSim, startTime)
    progressBar_plot(rowIndex, nSim)
    
  },
  finally = exit_loop_gracefully(fileName, startIndex, rowIndex)
)

# Stop
parallel::stopCluster(cl = my.cluster)

A word on parallesition: I used foreach and doParallel here and I found that it is important to run the parallelisation in chunks of the right size. Here, I used foreach() nested inside a for()-loop.

The functions predicted_finish(), progressBar_plot() and exit_loop_gracefully() can be found here.

When restarting the script, make sure to load the progress at the beginning, so you can pick-up where you stopped:

# File name to save progress
fileName <- "correction_value_simulation.RData"

# Check fileName already exists
if (file.exists(fileName)) {
  load(fileName)
  startIndex <- rowIndex - 1
} else {
  # If it is the first time starting the script
  startIndex <- 1
}

This could already be enough for most applications but I actually encountered another obstacle that needed fixing.

Problem 2: BRMS creates temporary files that fill up the hard disk

The final problem that I encountered was that brms creates temporary files for each model it fits. Normally this is not an issue but if you create hundred of thosuands of models then a few MB per model suddenly fill up the disk. The problem is that even if you delete these temporary files, the space is not cleared until you close & restart R.

So I wrote a function that checks the current disk space that is remaining…

check_root_disk_space <- function(){
  # Idea fro ChatGPT 
  # Execute the 'df' command and capture its output
  df_output <- system2("df", args = "-h /", stdout = TRUE)
  
  # Get available space
  avail_space_str <- strsplit(df_output[2], "  ")[[1]][4]
  
  # Check if availabe space is GB, MB or KB
  if(grep("G", avail_space_str)){
    measurement <- "G"
  } else if(grep("M", avail_space_str)){
    measurement <- "M"
  } else if(grep("K", avail_space_str)){
    measurement <- "K"
  }
  
  # Remove letter
  avail_space <- as.numeric(gsub("[KMG]", "", avail_space_str))
  
  # Convert KB/MB to GB
  if(measurement == "K"){
    avail_space_in_GB  <- avail_space/1024^3
  } else if(measurement == "M"){
    avail_space_in_GB  <- avail_space/1024^2
  } else if(measurement == "G"){
    avail_space_in_GB  <- avail_space 
  }
  
  # Write to console
  cat("\n Notice: Available space in root is", avail_space_in_GB, "GB\n")
  
  # Return value
  return(avail_space_in_GB)
}

… and use this function to occasionally check if enough space is available and stop the script if it is not.

    # Check if the space is enough
    avail_space <- check_root_disk_space()
    if(avail_space < minimumSpace){
      stop("Space is not enough. ")
    }

This together with a bash script, which restarts R script if it doesn’t finish successfully, solved the remaining issues.

Making it extra robust with this bash script

To avoid crashes because my computer ran out of memory (see above), I created this bash script (rscript_rboust.sh) that automatically restarts the r-script in case it encounters an error. This can be used as a robust alternative to Rscript.

#!/bin/bash

if [ $# -ne 1 ]; then
  echo "Usage: $0 <rscript>"
  exit 1
 fi
 
rscript=$1
 
while true; do
  Rscript $rscript
  
  if [ $? -eq 0 ]; then
    echo "R script finished successfully"
    break
  else
    echo "R script crashed. Restarting in 60 seconds..."
    sleep 60
  fi
done