Multi-core processing in R can save a lot of time, but it isn’t always straightforward. If you’ve found your way here, you’ve probably hit one of the limitations of mclapply – the main gateway function to parallel processing from R’s “parallel” package. This can happen when memory runs low or too many threads are generated. If you’re getting out-of-memory errors, crashes or mysterious NULLs with mclapply, try one or more of the following fixes:

 

Survival tips for mclapply

Set the parameter mc.preschedule =FALSE

With too many threads, return scheduling can become muddled.

Do more in each process to reduce the number of parallel processes.

Example: 4×100sec processes is safer than 100x4sec processes

Don’t reference globals inside mclapply – it doesn’t handle memory efficiently.

Pass data as function parameters instead.

Avoid passing big chunks of data to mclapply.

Don’t send a large data frame when a column will do

Keep a core spare for the system

Use mc.cores = detectCores()-1.

Keep an eye on core activity and memory with the linux/OSX “top” command

Break up a data frame using split :split(df, sample(1:N, nrow(df), replace=T)).

Process the chunks separately. Then reassemble them using rbindlist() from the data.table package.