Hi Chris, hi all,
just to confirm you that the aborting workers issue was related to the overall memory available for the root worker.
A submission file as follows
and making Matlab running all the 20 cpus, does not get any aborted workers as there is sufficient amount of additional allocated memory for the root cpu. I couldn't find another way to allow more memory to the master than to the others.
Before, the workers crashed and the calculation continued until the left memory was sufficient.
BTW I can confirm you that the scheme
mystruct = ;
poolobj = gcp;
parfor id_p = 1:Np
struct_temp = myfunction(id_p,, 'WorkSpace.mat');
mystruct(id_p) = struct_temp;
where myfunction loads the WorkSpace solves the transparency issue.
The memory issue was related to that each warkers need 1 to 2 Gb, no more, but the mystruct can be more than 20Gb so the root needs much memory than the workers.
Thanks for your time
In this post Patrizio Graziosi wrote:
in the end I saved the workspace and attached it to the parpool
poolobj = gcp;
WorkersConstant = parallel.pool.Constant('WorkSpace.mat');
parfor id_E = 1:nE
for id_n = 1:n_bands_transp
[tau_temp, tau_matth_temp, tau_IIS_temp] = tau_calc_funct_v3(id_E, id_n, 'WorkSpace.mat'); % the big tau_calc routine, the actual tau_calc in the serial version
taus(id_E,id_n) = tau_temp;
taus_matth(id_E,id_n) = tau_matth_temp;
and all the other files are lumped in a function with subfunctions that load the whole Workspace
It's probably not the best practice but it works for now.
The problem is that when I run it on a cluster I get a lot of aborted workers...can you support me on this or shalI I rise another topic or ...?