Multi-core processing in R can save a lot of time, but it isn’t always straightforward. If you’ve found your way here, you’ve probably hit one of the limitations of mclapply – the main gateway function to parallel processing from R’s “parallel” package. This can happen when memory runs low or too many threads are generated. If you’re getting out-of-memory errors, crashes or mysterious NULLs with mclapply, try one or more of the following fixes:
Survival tips for mclapply
Set the parameter mc.preschedule =FALSE
With too many threads, return scheduling can become muddled.
Do more in each process to reduce the number of parallel processes.
Example: 4×100sec processes is safer than 100x4sec processes
Don’t reference globals inside mclapply – it doesn’t handle memory efficiently.
Pass data as function parameters instead.
Avoid passing big chunks of data to mclapply.
Don’t send a large data frame when a column will do
Keep a core spare for the system
Use mc.cores = detectCores()-1.
Keep an eye on core activity and memory with the linux/OSX “top” command
Break up a data frame using split :split(df, sample(1:N, nrow(df), replace=T)).
Process the chunks separately. Then reassemble them using rbindlist() from the data.table package.