Running large simulations on a desktop computer with R (tips & tricks)
Let me briefly describe what I learned while attempting to run a large simulation on my desktop computer as I think it might be useful to others. Here is the problem: I was forced to run a simulation on my desktop computer because brms
couldn’t be installed on my department’s high performance computing (HPC) cluster. My first realisation was: Don’t use Windows!
Problem 0: You cannot stop Windows from rebooting
Trying to get this simulation to finish, showed me how terrible of a working environment Windows actually is. The biggest problem is that Windows doesn’t give you a choice if you want to update your system or not. I wasted a couple of days trying to find out if I can disable automatic updates. However, it is just not possible. As a consequence, the computer sometimes just restarted without my approval (outside of “working hours” despite the fact that the simulation was still running). This infuriated me so much that I set-up my desktop machine to dual boot Windows & Ubuntu. If you don’t use Ubuntu (or different linux), I’d suggest you start.
Problem 1: The simulation takes several days but I also work on the computer
For this simulation I had use my desktop computer in my office. This is the computer that I actually work on. So, I had to come up with a way to pause and restart the simulation whenever I have to use the computer for other stuff. This is done relatively easily. Whenever I have to use the computer for something else, I just press ESCAPE (for instance when running it via RStudio) or in my case use CTRL + C when running it via console. This allowed me to stop and restart the simulation whenever the need arises.
Here are the most important steps:
- Use
tryCatch()
. - Add a function that saves the progress in case an error occurs.
- Use parallelisation if it possible (more on this later).
- Add save points at which the progress is automatically saved if you don’t it for each iteration.
- Optional: Predict the time when the whole thing is supposed to finish.
# /*
# ----------------------------- Prepare cluster ---------------------------
# */
my.cluster <- parallel::makeCluster(detectCores() - 1, type = "PSOCK")
#my.cluster <- parallel::makeCluster(detectCores() - 2, type = "FORK")
doParallel::registerDoParallel(cl = my.cluster)
# /*
# ----------------------------- Run simulation ---------------------------
# */
# Get time at the start
startTime <- Sys.time()
# Main loop running through the number of simulations
tryCatch(
for(rowIndex in startIndex:nSim){
# Run chunks in parallel
tempTest <- foreach(i = 1:nTests, .combine = "c", .packages = c('polspline', 'brms')) %dopar% {
simulationFunction(36, 0)
}
# Check if it is a save point, then save the data to make sure progress is
# saved regularly
if(rowIndex %in% savePoints){
exit_loop_gracefully(fileName, startIndex, rowIndex)
}
# Print & visualise predicted finish
predicted_finish(startIndex, rowIndex, nSim, startTime)
progressBar_plot(rowIndex, nSim)
},
finally = exit_loop_gracefully(fileName, startIndex, rowIndex)
)
# Stop
parallel::stopCluster(cl = my.cluster)
A word on parallesition: I used foreach
and doParallel
here and I found that it is important to run the parallelisation in chunks of the right size. Here, I used foreach()
nested inside a for()
-loop.
The functions predicted_finish()
, progressBar_plot()
and exit_loop_gracefully()
can be found here.
When restarting the script, make sure to load the progress at the beginning, so you can pick-up where you stopped:
# File name to save progress
fileName <- "correction_value_simulation.RData"
# Check fileName already exists
if (file.exists(fileName)) {
load(fileName)
startIndex <- rowIndex - 1
} else {
# If it is the first time starting the script
startIndex <- 1
}
This could already be enough for most applications but I actually encountered another obstacle that needed fixing.
Problem 2: BRMS creates temporary files that fill up the hard disk
The final problem that I encountered was that brms
creates temporary files for each model it fits. Normally this is not an issue but if you create hundred of thosuands of models then a few MB per model suddenly fill up the disk. The problem is that even if you delete these temporary files, the space is not cleared until you close & restart R.
So I wrote a function that checks the current disk space that is remaining…
check_root_disk_space <- function(){
# Idea fro ChatGPT
# Execute the 'df' command and capture its output
df_output <- system2("df", args = "-h /", stdout = TRUE)
# Get available space
avail_space_str <- strsplit(df_output[2], " ")[[1]][4]
# Check if availabe space is GB, MB or KB
if(grep("G", avail_space_str)){
measurement <- "G"
} else if(grep("M", avail_space_str)){
measurement <- "M"
} else if(grep("K", avail_space_str)){
measurement <- "K"
}
# Remove letter
avail_space <- as.numeric(gsub("[KMG]", "", avail_space_str))
# Convert KB/MB to GB
if(measurement == "K"){
avail_space_in_GB <- avail_space/1024^3
} else if(measurement == "M"){
avail_space_in_GB <- avail_space/1024^2
} else if(measurement == "G"){
avail_space_in_GB <- avail_space
}
# Write to console
cat("\n Notice: Available space in root is", avail_space_in_GB, "GB\n")
# Return value
return(avail_space_in_GB)
}
… and use this function to occasionally check if enough space is available and stop the script if it is not.
# Check if the space is enough
avail_space <- check_root_disk_space()
if(avail_space < minimumSpace){
stop("Space is not enough. ")
}
This together with a bash script, which restarts R script if it doesn’t finish successfully, solved the remaining issues.
Making it extra robust with this bash script
To avoid crashes because my computer ran out of memory (see above), I created this bash script (rscript_rboust.sh) that automatically restarts the r-script in case it encounters an error. This can be used as a robust alternative to Rscript
.
#!/bin/bash
if [ $# -ne 1 ]; then
echo "Usage: $0 <rscript>"
exit 1
fi
rscript=$1
while true; do
Rscript $rscript
if [ $? -eq 0 ]; then
echo "R script finished successfully"
break
else
echo "R script crashed. Restarting in 60 seconds..."
sleep 60
fi
done