-
Notifications
You must be signed in to change notification settings - Fork 11
Description
From the log for job 519833
on A64FX (for EESSI/software-layer#1146):
== limiting parallelism to 4 (was 16) for GCCcore on aarch64/a64fx to avoid out-of-memory failures during building/testing
== limiting parallelism to 1 (was 4) for GCCcore on aarch64/a64fx to avoid out-of-memory failures during building/testing
== limiting parallelism to 4 (was 16) for GCC on aarch64/a64fx to avoid out-of-memory failures during building/testing
That doesn't look right, and definitely explains why the build is taking such a long time (close to 10h for GCCcore/13.3.0
😱 ).
The bot is configured to request a full A64FX node (48 cores), and then we configure EasyBuild to only use a quarter of the available cores (because of the limited amount of HBM memory, 32GB).
Since EasyBuild v5.0.0, the default EasyBuild configuration is to use max. 16 cores (see easybuilders/easybuild-framework#4816).
We pick up on that in the hooks, and then take a quarter of that (rather than the full 48 cores), resulting in 4 rather than the intended 12.
The make -j 4 bootstrap
in build step (of 1st iteration) of GCCcore
installation takes 5h46min, that would probably be a lot faster than using the intended 12 cores...
So maybe we should just disable the hook that limits parallelism to a quarter of the default parallelism when using EasyBuild 5.x, and only divert from that when really needed (when the build fails due to lack of memory)...
What's also bad is that the 2nd "limiting parallelism to 1 (was 4) for GCCcore
" shows that we're not careful when limiting parallelism through our hook: GCCcore
is an iterated build, and in the 2nd iterating we're again changing the level of parallelism to just a quarter (so 4 -> 1).
That's mainly painful in 3rd iteration, where build step uses make
(so single core...) and takes 3.5 hours).
We should only change tweak the parallelism during the 1st iteration, and then not touch it again.