-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
backlogWaiting to be resolvedWaiting to be resolved
Description
CPU resource utilization metrics for MPI jobs on SLURM systems appear to be inaccurate, suspect since psutil
is not identifying the correct child processes of the main job:
executor _processes dict_values([<Popen: returncode: None args: ['mpiexec', '-n', '20', 'fds', 'office_atria_...>])
child psutil.Process(pid=2546114, name='srun', status='sleeping', started='14:29:20')
child psutil.Process(pid=2546115, name='srun', status='sleeping', started='14:29:20')
child psutil.Process(pid=2546115, name='srun', status='sleeping', started='14:29:20')
Would expect to see a series of FDS child processes for each of the 20 cores it is running across, instead just see srun
...
Metadata
Metadata
Assignees
Labels
backlogWaiting to be resolvedWaiting to be resolved